auto-corpus: automated and consistent outputs from research publications auto-corpus: automated and consistent outputs from research publications yan hu ,a, shujian sun ,a, thomas rowlands , tim beck , ,b, and joram m. posma , ,b section of bioinformatics, division of systems medicine, department of metabolism, digestion and reproduction, imperial college london, sw az, united kingdom department of genetics and genome biology, university of leicester, le rh, united kingdom health data research (hdr) uk, united kingdom a these authors contributed equally. b these authors contributed equally. � abstract motivation: the availability of improved natural lan- guage processing (nlp) algorithms and models enable researchers to analyse larger corpora using open source tools. text mining of biomedical literature is one area for which nlp has been used in recent years with large untapped potential. however, in order to generate cor- pora that can be analyzed using machine learning nlp algorithms, these need to be standardized. summarizing data from literature to be stored into databases typically requires manual curation, especially for extracting data from result tables. results: we present here an automated pipeline that cleans html files from biomedical literature. the output is a single json file that contains the text for each section, table data in machine-readable format and lists of phenotypes and abbreviations found in the article. we analyzed a total of , open access articles from pubmed central, from both genome-wide and metabolome-wide association studies, and developed a model to standardize the section headers based on the information artifact ontology. extraction of table data was developed on pubmed articles and fine-tuned using the equivalent publisher versions. availability: the auto-corpus package is freely available with detailed instructions from github at https://github.com/jmp /autocorpus/. information artefact ontology | natural language processing | text standard- ization correspondence: timbeck [at] leicester.ac.uk and jmp [at] ic.ac.uk introduction natural language processing (nlp) is a branch of artificial intelligence that uses computers to process, understand and use human language. nlp is applied in many different fields including language modelling, speech recognition, text min- ing and translation systems. in the biomedical realm, nlp has been applied to extract for example medication data from electronic health records and patient clinical history from clinical notes, to significantly speed up processes that would otherwise be extracted manually by experts ( ). biomedical publications, unlike structured electronic health records, are semi-structured and this makes it difficult to extract and inte- grate the relevant information ( ). the format of research ar- ticles differs between publishers and sections describing the same entity, for example statistical methods, can be found in different locations in the document in different publica- tions. both unstructured text and semi-structured document elements, such as headings, main texts and tables, can con- tain important information that can be extracted using text mining ( ). the development of the genome-wide association study (gwas) has been led to by the on-going revolution in high- throughput genomic screening and a deeper understanding of the relationship between genetic variations and diseases/traits ( ). in a typical gwas, researchers collect data from study participants, use single nucleotide polymorphism (snp) ar- rays to detect the common variants among participants, and conduct statistical tests to determine if the association be- tween the variants and traits is significant. the results are mostly represented in publication tables, but can also be found in the main text, and there are multiple community ef- forts to store these reported associations in queryable, on- line databases ( , ). these efforts involve time-intensive and costly manual data curation to transcribe results from the publications, and supplementary information, into databases. summary-level gwas results are generally reported in the literature according to community norms (e.g. a snp asso- ciated to a phenotype with a probability value), hence nlp algorithms can be trained to recognize the formats in which data are reported to facilitate faster and scalable information extraction that is less prone to human error. development of effective automatic text mining algorithms for gwas literature can also potentially benefit other fields in biomedical research as the body of biomedical literature grows every day. yet previous attempts of mining scientific literature focused mainly on information extraction from ab- stracts and some on the main text, while for the most part ignoring tables. to facilitate the process of preparing a cor- pus for nlp tasks such as named-entity recognition (ner), text classification or relationship extraction, we have devel- oped an automated pipeline for consistent outputs from research publications (auto-corpus) as a python package. the main aims of auto-corpus are: • to provide clean text outputs for each publication sec- tion with standardized section names hu and sun, et al. | biorχiv | january , | – .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://github.com/jmp /autocorpus/ timbeck@leicester.ac.uk jmp @ic.ac.uk https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / • to represent each publication’s tables in a javascript object notation (json) format to facilitate data im- port into databases • to use the text outputs to find abbreviations used in the text we exemplify the package on a corpus of , open access gwas publications whose data have been manually added to the gwas central database to list phenotypes, snps and p-values found in the cleaned text (figure ). in addition, we also include data on , + metabolome-wide association studies (mwas) to ensure the methods are not biased towards one domain. mwas focus on small molecules, some of which are end-products of cellular regulatory processes, that are the response of the human body to genetic or environmental variations ( ). materials and methods data. hypertext markup language (html) files for , open access gwas publications whose data exists in the gwas central database ( ) were downloaded from pubmed central (pmc) in march . a further , open access publications of mwas on cancer, gastrointestinal diseases, metabolic syndrome, sepsis and neurodegenerative, psychi- atric, and brain illnesses were also downloaded in the same format. publisher versions of ca. % of these publications were downloaded in july to test the algorithms on pub- lications with different html formats. the gwas dataset was randomly divided into training publications to de- velop algorithms, and a test set of the remaining publica- tions. processing. html files were loaded using the beautiful- soup html parser package (v . . ). beautifulsoup was used to convert html files to tree-like structures with each branch representing a html section and each leaf a html element. after html files were loaded, all superscripts, subscripts, and italics were converted to plain text. auto- corpus extracts h , h and h tags for titles and headings, and p tags for paragraph texts using the default configura- tion. the headings and paragraphs are saved in a structured javascript object notation (json) file for each html file. tables are extracted from the document using a different set of configuration files (separate configurations for different ta- ble structures can be defined and used) and saved in a new json model that ensures tables of all formats and origin, not only restricted to gwas publications, can be described in the same structured model, so that these can be used as in- put to rule-based or deep learning algorithms for data extrac- tion. the data cells are stored in the “result” key, and their corresponding section name and header names are stored in “section_name” and “columns” keys respectively. therefore, extracting relationships between cells only requires simple rules. fig. . workflow of the auto-corpus package. | biorχiv hu and sun, et al. | auto-corpus .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ontologies for entity recognition. the information arti- fact ontology (iao) was created to serve as a domain-neutral resource for the representation of types of information con- tent entities such as documents, databases, and digital im- ages ( ). we used the v - - model ( ) in which different terms exist that describe headers typically found in biomedical literature. the extracted headers in the json file were first mapped to the iao terms using the lexical owl ontology matcher ( ). we use fuzzy matching using the fuzzywuzzy package (v . . ) to map headers to the pre- ferred section header terms and synonyms, with a similarity threshold of . . this threshold was evaluated by confirming all matches were accurate by two independent researchers. after the direct iao mapping and fuzzy matching, unmapped headers still exist. to map these headings, we developed a new method using a directed graph (digraph) for representa- tion since headers are not repeated within a document, are se- quential and have a set order that can be exploited. digraphs consist of nodes (entities, headers) and edges (links between nodes) and the weight of the nodes and edges is propor- tional to the number of publications in which these are found. while digraphs from individual publications are acyclic, the combined graph can contain cycles hence digraphs opposed to directed acyclic graphs are used. unmapped headers are assigned a section based on the digraph and the headers in the publication that could be mapped (anchor points). for example, at this point in this article the main headers are ‘ab- stract’ followed by ‘introduction’ and ‘materials and meth- ods’ that could make up a digraph. another article with head- ers ‘abstract’, ‘background’ and ‘materials and methods’ has two anchor points that match the digraph, and the unmapped header (‘background’) can be inferred from appearing in be- tween the anchor points in the digraph (‘abstract’, ‘materials and methods’): ‘introduction’. we use this process to eval- uate new potential synonyms for existing terms and identify new potential terms for sections found in biomedical litera- ture. we used the human phenotype ontology (hpo) to identify disease traits in the full texts. the hpo was developed with the goal to cover all common phenotypic abnormalities in hu- man monogenic diseases ( ). use cases: regular expression algorithms. abbrevia- tions in the full text are found using an adaptation of a previ- ously published methodology ( ) based on regular expres- sions using the abbreviations package (v . . ). the brief principle of it is to find all brackets within a corpus. if the number of words in a bracket is < it considers if it could be an abbreviation. it searches the characters within the brackets in the text on either side of the brackets one by one. the first character of one of these words must contain the first charac- ter within that bracket. and the other characters within that bracket must be contained by other words followed by the previous word whose first character is the same as the first character in that bracket. we combine the output of the pack- age with abbreviations defined in the abbreviations section (if found) from the iao/digraph model. for phenotype entity recognition, first any abbreviations in paragraphs extracted from the full text are replaced by their definition. this text is then tokenized using the spacy pack- age (v . ) (model en_core_web_sm) and compared against phenotypes and their synonyms defined by hpo for disease traits matching. p-values and snps were identified in the full text and tables based on regular expressions as they have a standard form. pairs of p-value-snp associations are found in the text using dependency parse trees ( ). use cases: deep learning-based named-entity recog- nition. the first example of a use case is to recognize the assay with which the data was acquired, however no ex- isting models exist for this purpose. we fine-tuned a pre- existing model trained for biomedical ner, the biomedi- cal bidirectional encoder representations from transform- ers (biobert) ( ), using part of our corpus where only mwas assays were tagged. we applied our fine-tuned model only on the paragraphs in the materials and methods sec- tions to recognize the assays used. a second biobert-based model was fine-tuned on phenotypes, which already exist in the data, and enriched in phenotypes associated with the mwas publications. this model was applied on only the abstract and paragraphs from the results section. the third example was applied only on paragraphs from the results and discussion sections using an existing model specifically trained to recognize chemical entities, chemlistem (v . . ) ( ). use cases: paragraph classification. it is possible un- mapped headers are mapped to multiple sections if the an- chor points are far apart. in order to test the applicability of a machine learning model to classify paragraphs we trained a random forest classifier on a dataset consisting of , ab- stract paragraphs and non-abstract paragraphs. % of the data was used for training and the remainder as the test set. results the order of sections in biomedical literature. a total of , headers were extracted from the , publica- tions, mapped to iao (v - - ) terms and visualized by means of a digraph with unique nodes and directed edges (figure a). the major unmapped node is ‘associated data’, which is a header specific for pmc articles that ap- pears at the beginning of each article before the abstract. the main structure of biomedical articles that were analyzed is: abstract → introduction → materials → results → discus- sion → conclusion → acknowledgements → footnotes sec- tion → references. iao has separate definitions for ‘mate- rials’ (iao: ), ‘methods’ (iao: ) and ‘statis- tical methods’ (iao: ) sections, hence they are sepa- rate nodes in the graph and introduction is also often followed by headers to reflect the methods section (and synonyms). there is also a major directed edge from introduction directly to results, with materials and methods placed after the discus- sion and/or conclusion sections. hu and sun, et al. | auto-corpus biorχiv | .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / all unmapped headers were investigated and evaluated whether some could be used as synonym for existing cate- gories. the digraph was also inspected by means of visual- izing individual ego-networks which show the edges around a specific node mapped to an existing iao term. figure b shows the ego-network for abstract, and four main categories and one potential new synonym (precis, in red) were iden- tified. the majority of unmapped headers (in purple), that follow the abstract, relate to a document that is written as one coherent whole, with specific headers for each section or a general header for the full/main text. an additional four unmapped headers relate to ‘materials and methods’ in their broader sense and these are data, data description, par- ticipants and sample. the remaining two categories of un- mapped headers to/from abstract can be classified as new sections ‘graphical abstract’ and ‘highlights’. these head- ers were found alongside, and appear to be distinct from, the (textual) abstract. based on the digraph, we then assigned data and data descrip- tion to be synonyms of the materials section, and participants and sample as a new category termed ‘participants’ which is related to, but deemed distinct from, the existing patients sec- tion (iao: ). the same process was applied to ego- networks from other nodes linked to existing iao terms to add additional synonyms to simplify the digraph. figure c shows the resulting digraph with only existing and newly pro- posed section terms. new proposed elements for the iao. each existing iao term contains one or more synonyms and extracted head- ers were first mapped directly to these terms. any headers that could not be mapped directly are mapped in the second step using fuzzy matching (e.g. the typographical error ‘ex- peremintal section’ in pmc is correctly mapped to the methods section). the last step involves mapping remain- ing unmapped headers to existing terms based on the digraph and using the structure (anchor headers) of the publication. headers that can be mapped to existing terms in the second and third steps, are included as synonyms in the model. the existing categories for which new potential synonyms were identified are listed in table a and b with their existing synonyms and newly identified synonyms. from the analysis of ego-networks four new potential cate- gories were identified: disclosure, graphical abstract, high- lights and participants. table details the proposed defini- tion and synonyms for these categories. in the digraph in figure c this section is located towards the end of a pub- lication and in some instances is followed by the conflict of interest section. table data extraction with different configurations. pmc articles are standardized which makes data extraction more straightforward, however some publications are not deposited into pmc or other repositories and can only be found via publisher websites. while the package has been developed using a large set of pmc articles, we compared the auto-corpus output for pmc articles with the output for the equivalent articles made available by the publishers. we found no differences in how headers were extracted and paragraphs were classified based on the digraph. however, the representation of tables does differ substantially between publishers, hence a model developed on pmc articles alone will fail to extract the data. we circumvent this issue by defin- ing configuration files for different table formats and we com- pare the accuracy of the data represented in the json format (figure ) between pmc and publisher versions of the same papers. using the default (pmc) configuration on non-pmc arti- cles none of the tables are represented accurately in the json. auto-corpus allows to use a variety of configura- tion files (a single file, or all as batch) to be used to extract data from tables. one configuration file, different to the de- fault, correctly represented the data in json format of % ( ) of tables. the remaining tables could be repre- sented correctly using different configuration files. when the right configuration file is used for non-pmc articles, all tables ( %) are represented identically to the json output from the matching pmc version. use cases. the extracted paragraphs were classified as one (or more) categories based on the digraph. this is the purpose of the auto-corpus package, to prepare a corpus for analy- sis so that different sections can be used for specific purposes. we detail how these standardized texts can be used for entity recognition. paragraph classification. while many headers can be mapped using fuzzy matching plus the digraph structure, some headers remain unmapped (e.g. the headers in purple in figure b: full text, main text, etc.) while others can be assigned to multiple (possible) sections. the choice of as- signing multiple categories to unmapped headers based on the digraph is deliberate as it is to ensure the algorithm does not wrongly assign it to only one (e.g. ‘materials’ over ‘meth- ods’). the next step is to perform the paragraph classification using nlp algorithms to learn from the word usage and con- text. we show that random forests can be used to this end by training it to distinguish between abstracts and other para- graphs. paragraphs from the test set were predicted us- ing a random forest trained on , paragraphs. for the test set, we obtained an f -score of . for classifying abstracts (precision = . , recall = . ) and . for classifying non- abstracts (precision = . , recall = . ). abbreviation identification. the abbreviation detection algo- rithm searches through each paragraph using a rule-based ap- proach to find all abbreviations used. auto-corpus then investigates whether a paragraph is mapped to the abbrevia- tions category and, if found, it combines these two lists of ab- breviations found in the publication. for example, when ap- plied on an mwas publication ( ) which contains a header titled “abbreviations” the algorithm combines the ab- breviations listed by the authors and with a further identi- fied from the text (figure ), including an abbreviation used with two spellings in the text. | biorχiv hu and sun, et al. | auto-corpus .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . digraph generated from analyzing section headers from , open access publications from pubmed central. (a) digraph of the v - - iao model consists of unique nodes, of which could be directly mapped to section terms (in orange) and the remainder are unmapped headers (in grey), and directed edges. relative node sizes and edge widths are directly proportional to the number of publications with these (subsequent) headers. blue edges indicate the edge with the highest weight from the source node, edges that exist in fewer than % of publications are shown in light grey and the remainder in black. (b) unmapped nodes connected to ‘abstract’ as ego node, excluding corpus specific nodes, grouped into different categories. unlabeled nodes are titles of paragraphs in the main text. (c) final digraph model used in auto-corpus to classify paragraphs after fuzzy matching. this model includes new (proposed) section terms and each section contains new synonyms identified in this analysis. ‘associated data’ is included as this is a pmc-specific header found before abstracts and can be used to indicate the start of most articles. rule-based extraction of gwas summary-level data. gwas central relies on curated data extracted manually from pub- lications or other databases. we investigated whether a rule-based approach to recognize phenotypes, snps and p- values can correctly identify data from publications con- tained within the database. a rule-based approach by ap- plying the hpo on the gwas publications from the test set, identified a total of , unique disease traits (major and minor) in these publications. traits are recorded for these publications in gwas central and the rule-based approach found with a perfect match. for % of the publica- tions all traits were correctly identified. snps have standard- ized formats, hence rule-based approaches are well suited for their identification. likewise, p-values in gwas publica- tions are typically represented using scientific notation and can also be identified using rule-based methods. a total of , snp/p-value pairs were found across the main text and tables of the publications. for . % of publications all associations recorded in the gwas central database are also found using this approach. while . % of these pub- lications present results (snp/p-value pairs) only in tables, and . % of pairs are found in tables, associations were identified from the main text that are not represented in ta- bles. , pairs match those recorded in the database (total of , pairs for these publications), however many associ- ations in the database are not represented in main text/tables but in supplementary materials. auto-corpus includes a separate function to convert csv/tsv data to table json for- mat (figure ), as summary-level results are often saved in these file formats as part of the supplementary information. named-entity recognition. three different deep learning models were used for ner on specific paragraphs of publica- tions. a pre-trained biomedical entity recognition algorithm ( ) was fine-tuned using the results from the rule-based approach applied on gwas data. example sentences that contain hpo terms were used to fine-tune the transformer model and then applied on mwas publications from four broad and distinct phenotypes (cancer, gastrointestinal diseases, metabolic syndrome, and neurodegenerative, psy- chiatric and brain illnesses). the fine-tuned deep learning algorithm obtained accuracies between . and . , aver- aging around . % (table ). we then fine-tuned the same base model for recognizing as- says in text by training on sentences identified from the text that contain assays routinely used in mwas. the first pass consisted of a rule-based approach, with fuzzy matching, to find sentences with terms and these were then used to fine- tune the deep learning model. figure shows the result- ing output in json format for one mwas publication ( ). hu and sun, et al. | auto-corpus biorχiv | .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / category (iao identifier) existing synonyms (iao v - - ) new synonyms identified a abstract (iao: ) abstract precis acknowledgements (iao: ) acknowledgements, acknowledgments acknowledgement, acknowledgment, acknowledgments and disclaimer author contributions (iao: ) author contributions, contributions by the authors authors’ contribution, authors’ contributions, authors’ roles, contributorship, main authors by consortium and author contributions discussion (iao: ) discussion, discussion section discussions footnote (iao: ) endnote, footnote footnotes introduction (iao: ) background, introduction introductory paragraph methods (iao: ) experimental, experimental procedures, experimental section, materials and methods, methods analytical methods, concise methods, experimental methods, method, method validation, methodology, methods and design, methods and procedures, methods and tools, methods/design, online methods, star methods, study design, study design and methods references (iao: ) bibliography, literature cited, references literature cited, reference, references, reference list, selected references, web site references supplementary material (iao: ) additional information, appendix, supplemental information, supplementary material, supporting information additional file, additional files, additional information and declarations, additional points, electronic supplementary material, electronic supplementary materials, online content, supplemental data, supplemental material, supplementary data, supplementary figures and tables, supplementary files, supplementary information, supplementary materials, supplementary materials figures, supplementary materials figures and tables, supplementary materials table, supplementary materials tables table a. newly identified synonyms for existing iao terms ( xx) from the digraph mapping of , publications. elements in italics have previously been submitted by us for inclusion into iao and added in the latest release (v - - ). lastly, we applied a domain specific algorithm for recogniz- ing chemical entities in the text and tables ( ) to identify metabolites in the same publication (figure ). discussion the analysis of our corpus of , open access publica- tions has resulted in identifying well over new synonyms for existing terms used in biomedical literature to indicate what a paragraph is about. in addition, we identified four new potential categories not previously included in the iao. we previously submitted a subset of synonyms reported here and one of the new categories for inclusion in the iao. these have been accepted by the iao and are included in the lat- est release (v - - ), hence we presented our analyses using the previous version of iao that does not include part of our work. in the latest release, the ‘graphical abstract’ section has been added (iao: ) based on our contri- bution. also, a new ‘research participants’ (iao: ) section has been added as contribution by others in the same release; therefore synonyms found here for the new category ‘participants’ section will be proposed in future as synonyms for the ‘research participants’ section. while the disclosure section appears to be distinct from the conflict of interest sec- tion due to a directed edge in the digraph, its synonyms could also be proposed to be part of the existing conflict of interest section in iao. standardization of text for nlp is an important step in preparing a corpus. auto-corpus outputs a json file of cleaned text, with standardized headers as well as all data presented in tables in json format. standardizing headers is important because some sections are more important than | biorχiv hu and sun, et al. | auto-corpus .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / category (iao identifier) existing synonyms (iao v - - ) new synonyms identified a abbreviations (iao: ) abbreviations, abbreviations list, abbreviations used, list of abbreviations, list of abbreviations used abbreviation and acronyms, abbreviation list, abbreviations and acronyms, abbreviations used in this paper, definitions for abbreviations, glossary, key abbreviations, non-standard abbreviations, nonstandard abbreviations, nonstandard abbreviations and acronyms author information (iao: ) author information, authors’ information biographies, contributor information availability (iao: ) availability, availability and requirements availability of data, availability of data and materials, data archiving, data availability, data availability statement, data sharing statement conclusion (iao: ) concluding remarks, conclusion, conclusions, findings, summary conclusion and perspectives, summary and conclusion conflict of interest (iao: ) competing interests, conflict of interest, conflict of interest statement, declaration of competing interests, disclosure of potential conflicts of interest authors’ disclosures of potential conflicts of interest, competing financial interests, conflict of interests, conflicts of interest, declaration of competing interest, declaration of interest, declaration of interests, disclosure of conflict of interest, duality of interest, statement of interest consent (iao: ) consent informed consent ethical approval (iao: ) ethical approval ethics approval and consent to participate, ethical requirements, ethics, ethics statement funding source declaration (iao: ) funding, funding information, funding sources, funding statement, funding/support, source of funding, sources of funding financial support, grants, role of the funding source, study funding future directions (iao: ) future challenges, future considerations, future developments, future directions, future outlook, future perspectives, future plans, future prospects, future research, future research directions, future studies, future work outlook materials (iao: ) materials data, data description statistical analysis (iao: ) statistical analysis statistical methods, statistical methods and analysis, statistics study limitations (iao: ) limitations, study limitations strengths and limitations, study strengths and limitations table b. newly identified synonyms for existing iao terms ( xx) from the digraph mapping of , publications. elements in italics have previously been submitted by us for inclusion into iao and added in the latest release (v - - ). hu and sun, et al. | auto-corpus biorχiv | .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / proposed category proposed definition proposed synonyms disclosure “a part of a document used to disclose any associations by authors that might be perceived as to potentially interfere with or prevent them from reporting research with complete objectivity.” author disclosure statement, declarations, disclosure, disclosure statement, disclosures graphical abstract “an abstract that is a pictorial summary of the main findings described in a document.” central illustration, graphical abstract, toc image, visual abstract highlights “a short collection of key messages that describe the core findings and essence of the article in concise form. it is distinct and separate from the abstract and only conveys the results and concept of a study. it is devoid of jargon, acronyms and abbreviations and targeted at a broader, non-technical audience.” author summary, editors’ summary, highlights, key points, overview, research in context, significance, toc participants “a section describing the recruitment of subjects into a research study. this section is distinct from the ‘patients’ section and mostly focusses on healthy volunteers.” participants, sample table . newly proposed categories of entities found in , publications in the biomedical literature that could not be mapped to existing terms in iao. elements in italics have previously been submitted by us for inclusion into iao and added in the latest release (v - - ). known phenotype papers accuracy cancer . gastrointestinal diseases . metabolic syndrome . neurodegenerative, psychiatric, brain illnesses . table . summary of results for named-entity recognition (ner) of phenotypes in mwas papers. others for specific tasks. for example, no new findings can be found in an introduction however it is well suited to discover the main phenotypes under study, only in materials/methods can details be found on how these phenotypes are studied and using what technologies, and findings can only be found in results (and discussion) sections. hence it is important to classify these paragraphs and auto-corpus does this by using the structure of the publication and the digraph. we showed that we can further improve the assignment by train- ing machine learning models with good accuracy to distin- guish between different types of texts in cases where there may be ambiguity - this can be further improved by using a multi-class classifier and using all paragraphs. these data are then available for use in downstream analyses using ded- icated algorithms for entity recognition or other methods. auto-corpus is able to process all html formatted tables from both gwas and mwas corpora, as opposed to pre- vious methods which could only operate on % of , tables ( ). it takes auto-corpus on average . seconds to process all tables within a publication compared to several minutes if this is done manually. moreover, auto-corpus also supports parallel computing, thereby further reducing the time needed to process publications as these can be run in batch. the structured json output is machine readable and can be used to support data import into database. here we used the json output of auto-corpus in several examples to demonstrate some potential use cases. we demonstrated that existing algorithms trained on biomedical data can be fine- tuned to recognize new entities such as assays and pheno- types, which also opens up the possibility of using these data to train new deep learning algorithms for recognizing new entities such as metabolites (opposed to chemical entities), snps and p-values, as well as identifying the relationships between them from text. ner algorithms have difficulty with recognizing terms that are abbreviated, therefore the list of abbreviations found by auto-corpus can be used to replace all abbreviations in the text to their definitions. conclusion the auto-corpus package is freely available and can be de- ployed on local machines as well as using high-performance computing to process publications in batch. a step-by-step guide to detail how to use auto-corpus is supplied with the package. the key features of auto-corpus are that it: . outputs all text and table data in a standardized json format, . classifies each paragraph into separate categories of text, and | biorχiv hu and sun, et al. | auto-corpus .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . example of json format for table data from this work (shown for table ). the auto-corpus output for tables consists of ‘status’, ‘error message’ and ‘tables’ as top level fields, ‘tables’ has fields ‘identifier’, ‘title’, ‘columns’, ‘section’ and ‘footer’, and ‘section’ contains ‘section name’ and ‘results’. fig. . example of json output of abbreviation detection using a rule-based ap- proach on an mwas publication ( ). fig. . example of json output of named-entity recognition (ner) on an mwas publication ( ) using a fine-tuned transformer-based deep learning model for as- says and bidirectional long-short term memory network for chemical entity recogni- tion. . is implemented in pure python code and does not have non-python dependencies. acknowledgements we thank mohamed ibrahim (university of leicester) for identifying different configu- rations of tables for different html formats, and joy li and filip makraduli (imperial college london) for testing the package and providing feedback. author contributions tb and jmp designed and supervised the research. ss and yh developed the pipeline and analyzed data. ss developed the initial table extraction algorithm and implemented the phenotype recognition algorithm. yh developed the section header standardization algorithm and implemented the abbreviation recognition al- gorithm. ss fine-tuned the table extraction algorithm for use on non-pmc texts. tr refined standardization of full texts and contributed algorithms for utf- and utf- conversions of non-ascii characters to unicode. ss, yh, tb and jmp wrote the manuscript. funding this work has been supported by health data research (hdr) uk and the medical research council via an ukri innovation fellowship to tb (mr/s / ) and a rutherford fund fellowship to jmp (mr/s / ). footnote orcid: - - - (jmp). bibliography . seyedmostafa sheikhalishahi, riccardo miotto, joel t dudley, alberto lavelli, fabio rinaldi, and venet osmani. natural language processing of clinical notes on chronic diseases: systematic review. jmir med inform, ( ):e , . issn - . doi: . / . . ramón a-a. erhardt, reinhard schneider, and christian blaschke. status of text-mining techniques applied to biomedical text. drug discovery today, ( ): – , . issn - . doi: https://doi.org/ . /j.drudis. . . . . nikola milosevic, cassie gregson, robert hernandez, and goran nenadic. a frame- work for information extraction from tables in biomedical literature. international jour- nal on document analysis and recognition (ijdar), ( ): – , . doi: . / s - - - . . peter m. visscher, naomi r. wray, qian zhang, pamela sklar, mark i. mccarthy, matthew a. brown, and jian yang. years of gwas discovery: biology, function, and translation. the american journal of human genetics, ( ): – , . issn - . doi: https://doi.org/ . /j.ajhg. . . . . tim beck, tom shorter, and anthony j brookes. gwas central: a comprehensive resource for the discovery and comparison of genotype and phenotype data from genome-wide as- sociation studies. nucleic acids research, (d ):d –d , . issn - . doi: . /nar/gkz . . annalisa buniello, jacqueline a l macarthur, maria cerezo, laura w harris, james hay- hurst, cinzia malangone, aoife mcmahon, joannella morales, edward mountjoy, elliot sol- lis, daniel suveges, olga vrousgou, patricia l whetzel, ridwan amode, jose a guillen, harpreet s riat, stephen j trevanion, peggy hall, heather junkins, paul flicek, tony bur- dett, lucia a hindorff, fiona cunningham, and helen parkinson. the nhgri-ebi gwas hu and sun, et al. | auto-corpus biorχiv | .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://gtr.ukri.org/projects?ref=mr/s / https://gtr.ukri.org/projects?ref=mr/s / https://orcid.org/ - - - https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / catalog of published genome-wide association studies, targeted arrays and summary statis- tics . nucleic acids research, (d ):d –d , . issn - . doi: . /nar/gky . . jeremy k. nicholson, elaine holmes, and paul elliott. the metabolome-wide association study: a new look at human disease risk factors. journal of proteome research, ( ): – , . doi: . /pr . pmid: . . werner ceusters. an information artifact ontology perspective on data collections and asso- ciated representational artifacts. studies in health technology and informatics, : – , . issn - . . alan ruttenberg, adam goldstein, albert goldfain, barry smith, bjoern peters, carlo tor- niai, chris mungall, chris stoeckert, christian a. boelling, darren natale, david osumi- sutherland, gwen frishkoff, holger stenzhorn, james a. overton, james malone, jen- nifer fostel, jie zheng, jonathan rees, larisa soldatova, lawrence hunter, mathias brochhausen, matt brush, melanie courtot, michel dumontier, paolo ciccarese, pat hayes, philippe rocca-serra, randy dipert, ron rudnicki, satya sahoo, sivaram ara- bandi, werner ceusters, william duncan, william hogan, and yongqun (oliver) he. infor- mation artefact ontology (v - - ). https://raw.githubusercontent.com/ information-artifact-ontology/iao/v - - /iao.owl, . ac- cessed: - - . . a. ghazvinian, n. f. noy, and m. a. musen. creating mappings for ontologies in biomedicine: simple methods work. amia annu symp proc, : – , . . peter n. robinson, sebastian köhler, sebastian bauer, dominik seelow, denise horn, and stefan mundlos. the human phenotype ontology: a tool for annotating and analyzing hu- man hereditary disease. the american journal of human genetics, ( ): – , . issn - . doi: https://doi.org/ . /j.ajhg. . . . . ariel schwartz and marti hearst. a simple algorithm for identifying abbreviation definitions in biomedical text. pacific symposium on biocomputing. pacific symposium on biocomputing, : – , . doi: . / _ . . katrin fundel, robert küffner, and ralf zimmer. relex—relation extraction using de- pendency parse trees. bioinformatics, ( ): – , . issn - . doi: . /bioinformatics/btl . . jinhyuk lee, wonjin yoon, sungdong kim, donghyeon kim, sunkyu kim, chan ho so, and jaewoo kang. biobert: a pre-trained biomedical language representation model for biomedical text mining. bioinformatics, ( ): – , . issn - . doi: . /bioinformatics/btz . . peter corbett and john boyle. chemlistem: chemical named entity recognition using recurrent neural networks. journal of cheminformatics, ( ), . doi: . / s - - - . . charles r. evans, alla karnovsky, melissa a. kovach, theodore j. standiford, charles f. burant, and kathleen a. stringer. untargeted lc–ms metabolomics of bronchoalveolar lavage fluid differentiates acute respiratory distress syndrome from health. journal of pro- teome research, ( ): – , . doi: . /pr . . nikola milosevic, cassie gregson, robert hernandez, and goran nenadic. disentangling the structure of tables in scientific literature. in elisabeth métais, farid meziane, mohamad saraee, vijayan sugumaran, and sunil vadera, editors, natural language processing and information systems, pages – . springer international publishing, . isbn - - - - . doi: https://doi.org/ . / - - - - _ . | biorχiv hu and sun, et al. | auto-corpus .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://raw.githubusercontent.com/information-artifact-ontology/iao/v - - /iao.owl https://raw.githubusercontent.com/information-artifact-ontology/iao/v - - /iao.owl https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / deephbv: a deep learning model to predict hepatitis b virus (hbv) integration sites. deephbv: a deep learning model to predict hepatitis b virus (hbv) integration sites. canbiao wu ¶, xiaofang guo ¶, mengyuan li ¶, xiayu fu , zeliang hou , manman zhai , , jingxian shen , xiaofan qiu , zifeng cui , hongxian xie , pengmin qin , xuchu weng , zheng hu , *, jiuxing liang * key laboratory of brain, cognition and education sciences, ministry of education, china; institute for brain research and rehabilitation, south china normal university, guangzhou, china. department of medical oncology of the eastern hospital, the first affiliated hospital, sun yat-sen university, guangzhou, guangdong, china department of gynecological oncology, the first affiliated hospital, sun yat-sen university, guangzhou, guangdong, china department of thoracic surgery, the first affiliated hospital, sun yat-sen university, guangzhou, guangdong, china school of psychology, south china normal university, guangzhou, guangdong, china generulor company bio-x lab, guangzhou, guangdong, china department of obstetrics and gynecology, tongji hospital, tongji medical college, huazhong university of science and technology, wuhan, hubei, china *corresponding author email: huzheng @ .com(zh), liangjiuxing@m.scnu.edu.cn(jl) ¶these authors contributed equally to this work. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / abstract hepatitis b virus (hbv) is one of the main causes for viral hepatitis and liver cancer. previous studies showed hbv can integrate into host genome and further promote malignant transformation. in this study, we developed an attention-based deep learning model deephbv to predict hbv integration sites by learning local genomic features automatically. we trained and tested deephbv using the hbv integration sites data from dsvis database. initially, deephbv showed auroc of . and aupr of . on the dataset. adding repeat peaks and tcga pan cancer peaks can significantly improve the model performance, with an auroc of . and . and an aupr of . and . , respectively. on independent validation dataset of hbv integration sites from visdb, deephbv with hbv integration sequences plus tcga pan cancer (auroc of . and aupr of . ) performed better than hbv integration sequences plus repeat peaks (auroc of . and aupr of . ). next, we found the transcriptional factor binding sites (tfbs) were significantly enriched near genomic positions that were paid attention to by convolution neural network. the binding sites of ar-halfsite, arnt, atf , bhlhe , bhlhe , bmal , clock, c-myc, coup-tfii, e a, ebf , erra and foxo were highlighted by deephbv attention mechanism in both dsvis dataset and visdb dataset, revealing the hbv integration preference. in summary, deephbv is a robust and explainable deep learning model not only for the prediction of hbv integration sites but also for further mechanism study of hbv induced cancer. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / author summary hepatitis b virus (hbv) is one of the main causes for viral hepatitis and liver cancer. previous studies showed hbv can integrate into host genome and further promote malignant transformation. in this study, we developed an attention-based deep learning model deephbv to predict hbv integration sites by learning local genomic features automatically. the performance of deephbv model significantly improves after adding genomic features, with an auroc of . and an aupr of . . furthermore, we enriched the transcriptional factor binding sites of proteins by convolution neural network. in summary, deephbv is a robust and explainable deep learning model not only for the prediction of hbv integration sites but also for the further study of hbv integration mechanism. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction hbv is the main cause of viral hepatitis and liver cancer (hepatocellular carcinoma: hcc) [ ]. it is a small dna virus that can integrate into the host genome via an rna intermediate [ ]. first, hbv attaches and enters into hepatocytes, then transports its nucleocapsid which contains a relaxed circular dna (rcdna) to the host nucleus. in host nucleus, rcdna is converted into covalently closed circular dna (cccdna) which produces messenger rnas (mrna) and pregenomic rna (pgrna) by transcription. via reverse transcription in host nucleus, pgrna produces new rcdna and double-stranded linear dna (dsldna), which tend to integrate into the host cell genome [ ]. previous study showed hbv integration breakpoints distributed randomly across the whole genome with a handful of hotspots [ ]. for instance, hbv was reported to recurrently integrate into the telomerase reverse transcriptase (tert) and myeloid/lymphoid or mixed-lineage leukemia (mll , also known as kmt b) genes. the insertional events were also accompanied by the altered expression of the integrated gene [ , , ], indicating important biological impacts on the local genome. further analysis revealed that the association between hbv integration and genomic instability existed in these insertional events [ ]. moreover, significant enrichment of hbv integration was found near the following genomic features in tumours compared to non-tumour tissue: repetitive regions, fragile sites, cpg islands and telomeres [ ]. however, the pattern and the mechanism of hbv integration still remained to be explored. many of the hbv integration sites distributed throughout the human .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / genome and seem completely random [ , , ]. whether the features and patterns of these “random” viral integration events could be learned and extracted remained an open question, and once solved, will greatly improve the understanding towards hbv integration induced carcinogenesis. deep learning has an excellent performance in computational biology research, such as medical image identification [ ], discovering motifs in protein sequences [ ]. the convolutional neural network (cnn) is the most important part in deep learning, which enables a computer to learn and program itself from training data [ ]. though deep learning performs excellent in a various of fields, the detailed theory of how it makes the decision was hard to explain due to its black box effect. therefore, an approach named attention mechanism which can highlight the outstanding parts was invented to open the “black box” [ , ]. in this study, we developed, deephbv, an attention-based model to predict the hbv integration sites using deep learning. the attention mechanism calculates the attention weight for each position and connect the encoder and the decoder in the meanwhile. it highlights the regions concentrated by deephbv and helps figure out the patterns that were paid attention to. deephbv can predict hbv integration sites accurately and specifically, and the attention mechanism identified positions with potential important biological meanings. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / results deephbv effectively predicts hbv integration sites by adding genomic features. deephbv model structure and the scheme of encoding a kb sample into a binary matrix were described in fig . deephbv model was tested with our hbv integration sites database (http://dsvis.wuhansoftware.com). hbv integration sequences were prepared according to hbv integration sites as positive/negative samples following the steps in method. the negative samples should be twice number of positive samples to keep data balance and to improve the confidence level. the positive samples were divided into and as positive training dataset and testing dataset. ccorrespondingly, we extracted and negative samples as negative training dataset and testing dataset. deephint, an existing deep learning model for predicting hiv integration sites according to surroundings [ ], will also be evaluated using hbv integration sequences for training and testing. both models were trained by the same hbv integration training dataset and used the same testing dataset for the evaluation. deephbv with hbv integration sequences showed an auroc of . and an aupr of . while deephint with hbv integration sequences demonstrated an auroc of . and an aupr of . (fig ). the comparison of deephbv and deephint was described in discussion part. several previous studies showed that hbv integration has a preference on surrounding genomic features such as repeat, histone markers, cpg islands, etc [ , ]. thus, we tried to add these genomic features into deephbv, by mixing genomic feature samples together with hbv integration sequences as new datasets, then .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / trained and tested the updated deephbv models. we downloaded following genomic features from different datasets [ - ] into four subgroups: ( ) dnase clusters, fragile site, repeatmasker; ( ) cpg islands, genehancer; ( ) cons mammals, tcga pan-cancer; ( ) h k me chip-seq, h k ac chip-seq (s fig). after obtaining genomic feature data positions (sources are mentioned in s table), we extended the positions to bp and extracted related sequences on hg reference genome. we defined these sequences as positive genmoic feature samples. then we mixed hbv integration sequences, positive genome feature samples, and randomly picked negative genomic feature samples (see method) together and trained the deephbv model. once a subgroup performed well, we re-test each genomic feature in that subgroup to figure out which specific genomic feature affect the model performance significantly (s fig) (auroc and aupr values were recorded in s table). from the roc and pr curves, we found deephbv with hbv integration sites plus the genomic features repeat (auroc: . and aupr: . ) and tcga pan cancer (auroc: . and aupr: . ) can significantly improve the hbv integration sites prediction performance against deephbv with hbv integration sequences (fig ). we also performed the same test on deephint, but did not find a subgroup can substantially improve the model performance (results were recorded in s table). together, deephbv with hbv integration sequences plus repeat or tcga pan cancer can significantly improve the model performance. validation of deephbv using independent dataset visdb it is necessary of deephbv to be applied on general datasets, we tested the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pre-trained deephbv models (deephbv with hbv integration sequences + repeat peaks and deephbv with hbv integration sequences + tcga pan cancer peaks) on the hbv integration sites dataset in another viruses integration sites (vis) database visdb [ ]. we found that in the model trained with hbv integration sequences + repeat sequences showed an auroc of . and an aupr of . , while the model trained with hbv integrated sequences + tcga pan cancer showed an auroc of . and an aupr of . . the deephbv model with hbv integration sequences + tcga pan cancer performed better compared with deephbv model with hbv integration sequences + repeat and was more robust on both testing dataset from dsvis (auroc: . and aupr: . ) and independent testing dataset from visdb (auroc: . and aupr: . ). thus, we decided to use this model for future hbv integration sites study. study the preference pattern of hbv integration by conserved sequence elements deephbv can extract features with translation invariance by pooling operation, which enables deephbv to recognise certain patterns even the features were slightly translated. the participating of attention mechanism into deephbv framework might partly open the deep learning black box by giving an attention weight to each position. each attention weight represented the computational importance level of that position in deephbv judgement. the attention weights in attention layer were extracted after two de-convolution and one de-pooling operation and the output shape .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / is × . each score represented an attention weight of a bp region. positions with higher attention weight scores might have more important impact on the pattern recognition of deephbv, meaning these positions might be the critical points for identifying hbv integration positive samples. we first averaged the fractions of attention scores in all hbv integration sequences and normalized them to the mean of all positions. then we visualised the fractions of attention scores and found the figure showed peak-valley-peak patterns only in positive samples (fig ). we were interested in the positions with higher attention weights in convolution neural network. and we found that, in the attention weight distribution of deephbv with hbv integration sites + tcga pan cancer, a cluster of attention weights much higher than other weights often occurred in the positive samples. while in the model of deephbv with hbv integration sites + repeat did not show this pattern (fig ). to further discover the pattern behind these positions with higher attention weights, we defined the sites with top % highest attention weights as attention intensive sites, the regions of bp near them as attention intensive regions. we mapped these attention intensive sites on hg reference genome with genomic features (fig ), but found that the positional relationship between attention intensive sites and genomic features was not quite clear. the results indicated that there may exist other specific pattern closely related to hbv integration preference, and when analysed carefully, could be recognized by the deephbv model. convolution and pooling module will learn the patterns with translation invariance in deep learning, based on that deep learning network tend to learn the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / domains happened recurrently among different samples in the same pooling matrix, even if the learned feature was not at the same position in these different samples [ , ]. attention intensive regions are more likely to be conserved due to the translation invariance in convolution and pooling module, and would give hints to the selection preference of hbv integration sites. transcriptional factor-binding sites (tfbs) motifs are conserved genomic elements which can be critical to the regulation of downstream genes. therefore, we tested whether tfbs played important roles in hbv integration preference. we used all hbv integration samples whose prediction scores were higher than . from dsvis and visdb separately to enrich local tfbs motifs in attention intensive regions by homer v . . [ ] with its vertebrates transcription factor databases (table ). from the result of deephbv with hbv integration sequences + tcga pan cancer, binding sites of ar-halfsite, arnt, atf , bhlhe , bhlhe , bmal , clock, c-myc, coup-tfii, e a, ebf , erra, foxo , heb, hic , hif- b, lrf, meis , mitf, mnt, myog, n-myc, npas , npas, nr a , ptf a, snail , tbx , tbx , tcf , tead , tead , tead , tead, tgif , tgif , thrb, usf , usf , zac , zeb , zfx, znf , znf can be both enriched in attention intensive regions of dsvis and visdb sequences. we selected two representative samples to obtain a more intuitive display. genomic features, hbv integration sites from dsvis and visdb, attention intensive sites and tfbs were aligned and shown in hg reference genome (fig ). most attention intensive sites can be mapped to enrich tf motifs. and the clusters of high attention weight from the output of deephbv with hbv integration sites plus tcga pan cancer showed the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / binding site of a tumour suppressor gene hic , circadian clock related elements bmal , clock, c-myc and naps (fig ). the data provided novel insights into hbv integration site selection preference and reveal biological importance that warrants future experimental confirmation. table . enriched tfbs from attention intensive regions of deephbv with hbv integration sites + tcga pan cancer peaks. homer known results homer de novo results rank name p-value rank best match/details p-value bmal e- tead e- npas . e- ebf e- clock . e- tcf e- c-myc . e- grhl e- zfx . e- dux e- tgif . e- ptf a e- mnt . e- tead e- lrf . e- ahr::arnt . e- tbx . e- sox . e- znf . e- tead . e- n-myc . e- zic . e- znf . e- nr e . e- usf . e- sox . e- bhlhe . e- zbtb . e- rbpj . e- usf . e- zac . e- isl . e- tgif . e- znf . e- zeb . e- ascl . e- thrb . e- znf . e- ptf a . e- lrf . e- bhlhe . e- znf . e- tead . e- pknox . e- stat . e- bcl b . e- meis . e- arnt . e- c-myc . e- osr . e- usf . e- tfap a . e- npas . e- hic . e- tead . e- tead . e- ar-halfsite . e- stat . e- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / tcf . e- mitf . e- tead . e- atf . e- hif- b . e- foxo . e- e a . e- tead . e- mef a . e- znf . e- nkx . . e- coup-tfii . e- myog . e- nkx . . e- snail . e- heb . e- tbx . e- scrt . e- nr a . e- nanog . e- oct . e- elk . e- erra . e- gata . e- bhlha . e- amyb . e- nr a . e- nfkb-p -rel . e- zic . e- trps . e- hoxa . e- hif a . e- isl . e- cebp:ap . e- ews:fli -fusion . e- foxk . e- ets . e- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / discussion in this study, we developed an explainable attention-based deep learning model deephbv to predict hbv integration sites. in the comparison of deephbv and deephint on predicting hbv integration sites (s table), deephbv out-performed deephint after adding genomic features due to its more suitable model structure and parameters on recognising the surroundings of hbv integration sites. we applied two convolution layers ( st layer: convolution kernels and the kernel size is ; nd layer: convolution kernels and the kernel size is ) and one pooling layer (with pooling size of ) in deephbv while in deephint the model only have one convolution layer ( convolution kernels and the kernel size is ) and one pooling layer (with pool size of ). the increasing of convolution layers enables the information from higher dimensions can be extracted, the increasing of convolution kernels enables more feature information to be extracted [ ]. we trained the deephbv model using three strategies ( ) dna sequences near hbv integration sites (hbv integration sequences), ( ) hbv integration sequences + tcga pan cancer peaks, ( ) hbv integration sequences + repeat peaks. we found that the model with hbv integration sequences adding tcga pan cancer or repeat can both significantly improve the model performance. and the deephbv with hbv integration sequences adding tcga pan cancer peaks performed better on independent test dataset visdb. however, the attention intensive regions cannot be well aligned to these genomic features. thus, we further inferred that other features such as tfbs motifs may influence deephbv in the prediction process. and .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / homer was applied to recognise these tfbs that might be related to hbv-related diseases or cancer development. we noticed that the attention intensive regions identified by attention mechanism of deephbv with hbv integration sequences + tcga pan cancer showed strong concentration on the binding site of the tumour suppressor gene hic , circadian clock-related elements bmal , clock, c-myc, naps , and the transcription factors tead and nr a . these dna binding proteins were closely related to tumour development [ - ]. for instance, hic is a tumour suppressor gene in hepatocarcinogenesis development [ , ]. bmal , clock, c-myc, naps all participate in the regulation of circadian clock [ ], which is reported to promote hbv-related diseases [ , ]. in accordance, the binding motif of circadian clock-related elements were also enriched from the attention intensive regions of deephbv with hbv integration sequences + repeats, further confirming the results (s table). in addition, the other transcription factors identified by deep hbv are tead and nr a . tead deregulation affected well-established cancer genes such as braf, kras, myc, nf and lkb , and showed high correlation with clinicopathological parameters in human malignancies [ ]. nr a (also known as liver receptor homolog- , lrh- ) binds to the enhancer ii (enii) of hbv genes, and serves as a critical regulator of their expression [ ]. in summary, deephbv is a robust deep learning model of using convolutional neural network to predict hbv integrations. our data provide new insight into the preference for hbv integration and mechanism research on hbv induced cancer. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / methods data preparation a detailed step-by-step instruction of deephbv was provided in s and s notes. to obtain positive training and testing samples for deephbv, we extracted bp dna sequences from upstream and bp dna sequences from downstream of hbv integration sites as positive dataset, each sample was denoted as 𝑆 = (𝑛 ,𝑛 ,…,𝑛 ), where 𝑛i represents the nucleotide in position i. deephbv, as a deep learning network also require negative samples that do not contain hbv integration sites as background area. the existing of hbv integration hot spots which contains several integration events within ~ kb range [ ] prompted us that we should selected background area keeping enough distance from known hbv integration sites. thus, we discarded the regions around known hbv integration sites with length kb on hg reference genome and selected kb length dna sequences randomly on remained regions as negative samples. we encoded extracted dna sequences using one-hot code to make the calculation of distance between features in training and the calculation of similarity more accuracy. original dna sequences were converted to binary matrices of -bit length where each dimension corresponds to one nucleotide type. finally, we converted a bp dna sequence into a × binary matrix. feature extraction deephbv model first applied convolution and pooling module to learn and .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / obtain sequence features around hbv integration sites (s fig). each binary matrix representing a dna sequence entered the convolution and pooling module to execute convolution calculation. we employed multiple variant convolution kernels to calculation in order to obtain different features. s = (𝑛 ,𝑛 ,…,𝑛 ) denoted as a specific dna sequence and e represented the binary matrix- encoded from s, the convolutional calculation in convolution layer refers to 𝑋 = 𝑐𝑜𝑛𝑣(𝐸), which can be described as: 𝑋𝑘,𝑗= ∑ 𝑝― 𝑗= ∑ 𝐿 𝑙= 𝑊𝑘,𝑗,𝑙𝐸𝑙,𝑖+𝑗 ( ) where ≤ 𝑘 ≤ 𝑑, 𝑑 refers to the number of kernels, ≤ 𝑖 ≤ 𝑛 ― 𝑝 + , 𝑖 refers to the index, 𝑝 refers to the kernel size, n refers to input sequence length, 𝑊 refers to the kernel weight. convolutional layer activated eigen vectors using rectified linear unit (relu) after extracting relative eigen vectors. relu is an activation function in artificial neural networks which can be described as 𝑓(𝑥) = max ( ,𝑥). we applied relu on the output matrix of each convolution layer and mapped each element on a sparse matrix. relu imitates real neuron activation, which enables data fitted to the model better. then we applied max-pooling strategy to complete dimension reduction as well as support the maximum retention of predicted information. till now, we achieved the final eigen vector 𝐹c from the binary matrix represented dna sequence after feature extracting in convolution and pooling module. attention mechanism in deephbv model deephbv added attention mechanism in order to capture and understand the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / position contribution in abstracted eigen-vector 𝐹c. eigen-vector entered the attention layer, which will calculate a weight value to each dimension in 𝐹c. the attention weight represents the contribution level of the convolutional neural network (cnn) in that position. the output of attention weight 𝑡𝑗 is the contribution score, larger 𝑡𝑗 score means bigger contribution in this position to hbv integration sites prediction. all contribution scores were normalized to achieve the dense eigenvector matrix, which denoted as 𝐹𝑎: 𝐹𝑎 = ∑ 𝑞 𝑗= 𝑎𝑗𝑣𝑗 ( ) where， 𝑎𝑗 = 𝑒𝑥𝑝 (𝑡𝑗) ∑𝑞𝑖 𝑒𝑥𝑝 (𝑡𝑖) ( ) where 𝑎𝑗 represents the relevant normalisation score, 𝑣𝑗 represents the eigenvector at position 𝑗 of the input eigenmatrix. each position represents an extracted eigen-vector in each convolution kernel. the convolution-pooling module and the attention mechanism module need to be combined in model prediction progress, in another word, eigen-vector 𝐹c and relative eigen important score 𝐹𝑎 should work together in hbv integration sites prediction. we linked the values in eigen-vector 𝐹c and linearly mapped them to a new vector 𝐹𝑣, which is: 𝐹𝑣= (𝑑𝑒𝑛𝑠𝑒(𝑓𝑙𝑎𝑡𝑡𝑒𝑛(𝐹c))) ( ) in this step, flatten layer performed function 𝑓𝑙𝑎𝑡𝑡𝑒𝑛() to reduce dimension and concatenate data; function 𝑑𝑒𝑛𝑠𝑒() was executed by dense layer, which will map .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / dimension-reduced data to a single value. then 𝐹𝑣 and 𝐹𝑎 concatenated vector entered linear classifier prediction to calculate the probability of hbv integration happened within the current sequence, with: 𝑃 = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑(𝑐𝑜𝑛𝑐𝑎𝑡(𝐹𝑎,𝐹𝑣)) ( ) where 𝑃 is the predicted score, 𝑠𝑖𝑔𝑚𝑜𝑖𝑑() represents the activation function acted as classifier in final output, 𝑐𝑜𝑛𝑐𝑎𝑡() represents the concatenate operation. in the meantime, if we give the output eigenvector 𝐹c from convolution-and-pooling module as input, and execute attention mechanism, weight vector 𝑊 can be achieved: 𝑊 = 𝑎𝑡𝑡(𝑎 ,𝑎 ,…,𝑎𝑞) ( ) where 𝑎𝑡𝑡() refers to the attention mechanism, 𝑎𝑖 denotes the eigenvector in 𝑖𝑡ℎ dimension in the eigenmatrix, 𝑊 represents the dataset containing contribution scores of each position in the eigenmatrix extracted by convolution-and-pooling module. deephbv model training after confirming each parameter in deephbv (s table), we trained the deep learning neural network model deephbv via binary crossentropy. the loss function of deephbv can be defined as: loss = -∑𝑖 𝑦𝑖 log(𝑃) + ( ― 𝑦𝑖) log( ― 𝑃) ( ) where, 𝑦𝑖 is the prediction score, 𝑃 is the binary tag value of that sequence (in this dataset, positive samples were labelled as and negative samples were labelled as ). back propagation algorithm was adapted in training progress and .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / nesterov-accelerated adaptive moment estimation (nadam) gradient descent algorithm was applied to optimise parameter initialization. the deep learning neural network model adapted python . , keras library . . [ ] using three nvidia® tesla v -pcie- g（nvidia corporation, california, usa ） for training and testing. deephbv takes around min and s for model training and testing respectively using the computational platform under such software and hardware settings. data availability deephbv is available as an open-source software and can be downloaded from https://github.com/jiuxingliang/deephbv.git reference . liang tj. hepatitis b: the virus and disease. hepatology ; ( suppl):s - . . tu t, budzinska ma, shackel na et al. hbv dna integration: molecular mechanisms and clinical implications. viruses ; ( ). . sung wk, zheng h, li s et al. genome-wide survey of recurrent hbv integration in hepatocellular carcinoma. nat genet ; ( ): - . . zhao lh, liu x, yan hx et al. genomic and oncogenic preference of hbv integration in hepatocellular carcinoma. nat commun ; : . . ding d, lou x, hua d et al. recurrent targeted genes of hepatitis b virus in the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / liver cancer genomes identified by a next-generation sequencing-based approach. plos genet ; ( ):e . . tu t, budzinska ma, vondran fwr et al. hepatitis b virus dna integration occurs early in the viral life cycle in an in vitro infection model via sodium taurocholate cotransporting polypeptide-dependent uptake of enveloped virus particles. j virol ; ( ). . mason ws, gill us, litwin s et al. hbv dna integration and clonal hepatocyte expansion in chronic hepatitis b patients considered immune tolerant. gastroenterology ; ( ): - e . . litjens g, kooi t, bejnordi be et al. a survey on deep learning in medical image analysis. med image anal ; : - . . bailey tl, baker me, elkan cp. an artificial intelligence approach to motif discovery in protein sequences: application to steroid dehydrogenases. the journal of steroid biochemistry and molecular biology ; ( ): - . . yamashita r, nishio m, do rkg et al. convolutional neural networks: an overview and application in radiology. insights into imaging ; ( ): - . . bahdanau d, cho k, bengio y. neural machine translation by jointly learning to align and translate. computer science . . guidotti r, monreale a, ruggieri s et al. a survey of methods for explaining black box models. acm comput. surv. ; ( ):article . . hu z, zhu d, wang w et al. genome-wide profiling of hpv integration in cervical cancer identifies clustered genomic hot spots and a potential .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / microhomology-mediated integration mechanism. nat genet ; ( ): - . . chollet fao. keras. . . hu h, xiao a, zhang s et al. deephint: understanding hiv- integration via deep learning with attention. bioinformatics ; ( ): - . . haeussler m, zweig as, tyner c et al. the ucsc genome browser database: update. nucleic acids res ; (d ):d -d . . inoue f, kircher m, martin b et al. a systematic comparison reveals substantial differences in chromosomal versus episomal encoding of enhancer activity. genome res ; ( ): - . . robinson jt, thorvaldsdottir h, winckler w et al. integrative genomics viewer. nature biotechnology ; ( ): - . . tang d, li b, xu t et al. visdb: a manually curated database of viral integration sites in the human genome. nucleic acids res . . zhang w, itoh k, tanida j et al. parallel distributed processing model with local space-invariant interconnections and its optical architecture. appl opt ; ( ): - . . bruna j, zaremba w, szlam a et al. spectral networks and locally connected networks on graphs. computer science . . heinz s, benner c, spann n et al. simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and b cell identities. molecular cell ; ( ): - . . seide f, gang l, dong y. conversational speech transcription using .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / context-dependent deep neural networks. . . taniguchi k, roberts lr, aderca in et al. mutational spectrum of beta-catenin, axin , and axin in hepatocellular carcinomas and hepatoblastomas. oncogene ; ( ): - . . zheng j, xiong d, sun x et al. signification of hypermethylated in cancer (hic ) as tumor suppressor gene in tumor progression. cancer microenviron ; ( ): - . . paibomesai mi, moghadam hk, ferguson mm et al. clock genes and their genomic distributions in three species of salmonid fishes: associations with genes regulating sexual maturation and cell cycling. bmc res notes ; : . . fekry b, ribas-latre a, baumgartner c et al. incompatibility of the circadian protein bmal and hnf alpha in hepatocellular carcinoma. nat commun ; ( ): . . mukherji a, bailey sm, staels b et al. the circadian clock and liver function in health and disease. j hepatol ; ( ): - . . huh hd, kim dh, jeong hs et al. regulation of tead transcription factors in cancer biology. cells ; ( ). . cai yn, zhou q, kong yy et al. lrh- /hb f and hnf synergistically up-regulate hepatitis b virus gene transcription and dna replication. cell research ; ( ): - . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure legends figure . the deep learning framework applied in deephbv. (a) scheme of encoding a kb dna sequence into a binary matrix using one-hot code; (b) a brief flowchart of deephbv structure, the matrix shape was included in brackets, and a detailed flowchart was in s fig. figure . evaluation of deephbv and deephint model prediction performance on the test dataset. (a) receiver-operating characteristic (roc) curves and (b) precision recall (pr) curves, respectively. “deephbv with hbv integration sequences” refers to deephbv model with only hbv integration sequences as input; “deephint with hbv integration sequences” refers to deephint model with only hbv integration sequences as input; “deephbv with hbv integration sequences + repeat” refers to deephbv integration sequences and repeat sequences as input; “deephbv with hbv integration sequences” refers to deephbv integration sequences and tcga pan cancer sequences as input: “deephbv with hbv integration sequences + repeat + (test) visdb” refers to deephbv using hbv integration sequences and repeat sequences for training and using visdb as independent test dataset; “hbv with hbv integration sequences + tcga pan cancer + (test) visdb” refers to deephbv using hbv integration sequences as tcga pan cancer sequences for training and using visdb as independent test dataset. figure . the attention weight distribution of analysed by deephbv with hbv integration sequences + genomic features. (a) deephbv with hbv integration sequences + tcga pan cancer peaks; (b) deephbv with hbv integration .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / sequences + repeat peaks. the left graph showed the fractions of attention weight, which were averaged among all samples and normalized to the average of all positions, each index represents a bp region due to the multiple convolution and pooling operation. the graphs on the right are representative samples of attention weight distribution of positive samples and negative samples. figure . attention intensive regions highlighted essential local genomic features on predicting hbv integration sites. representative examples showed the positional relationship between the attention intensive sites and several genomic features using deephbv with hbv integration sequences + tcga pan cancer model on (a) chr : , , - , , (hg ), (b) chr : - (hg ). each of these two sequences contains hbv integration sites from both dsvis and visdb. enriched dna binding proteins detected by homer from the attention intensive regions using the output of deephbv then we applied fimo [ ] to find the enriched motif position and label the motifs on attention intensive regions. ucsc genome browser [ ] and matplotlib [ ] was used for visualisation. “hpv integration site” refers to the sites selected from our unpublished database used as testing samples. “attention intensive sites” denotes the sites with top % attention weight. “repeatmasker”, “tcga pan cancer”, “dnase clusters”, “con mammals”, “genehancer”, “layered h k ac”, “layered h k me ” are genomic features. references . grant ce, bailey tl, noble ws. fimo: scanning for occurrences of a given .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / motif. bioinformatics ; ( ): - . . haeussler m, zweig as, tyner c et al. the ucsc genome browser database: update. nucleic acids res ; (d ):d -d . . hunter jd. matplotlib: a d graphics environment. computing in science & engineering ; ( ): - . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supporting information s fig. deephbv framework. each part represents a layer in neural network and 𝑛 × 𝑛 stands for the output dimension which was explained in s note. two continuous convolution layers were used to extract features; max-pooling layers can reduce the dimension while keeping the feature matrix has the ability to predicting information; dropout layer randomly drop some results to prevent over-fit; flatten layer is responsible for reduce the dimensions and connect them; dense layer is used to map the output from last layer to a specific value; attention layer and attention flatten are used to give a weight score to each dimension in the feature matrix; concatenate layer concatenates captured features and importance scores of those features from the convolution module and the attention mechanism model. prediction output offered the final output reveals the probability of hbv infection. s fig. prediction performance on the hbv integration dataset with different types of genomic features added in. we found that character and character outperformed the deephbv model with an significant increase in aupr and auroc score on character and character , indicating that deephbv can capture genomic features from character and character effectively, so we did further analysis on each single items in character group and , and found that repeats and tcga pan cancer are the genomic features that can be captured by deephbv which significantly improved model performance. deephbv with hbv integration sequences + repeats reached the auroc of . and the aupr of . , which deephbv with hbv integration sequences + tcga pan cancer reached the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / auroc of . and the aupr of . . s table. the parameters for the deep neural network used in deephbv. s table. genomic features and sources. (access date: novemember th, ) s table. comparison of deephbv and deephint result record. s table. enriched tfbs from attention intensive regions of deephbv with hbv integration sites + repeat peaks. s note. deephbv framework. deephbv neural network structure design and hyperparameters involved in deephbv are noted. s note. mathematical matters of the deephbv. there are explanations for mathematical matters (i.e. encoding dna sequences, convolution layers, the max pooling layer, dropout layer, attention layer, concatenate layer, linear classifier and optimisation algorithm) of the deephbv in this part. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / apobec mediated c-to-u rna editing: target sequence and trans-acting factor contribution to rna editing events in murine transcripts in-vivo. saeed soleymanjahi , valerie blanc and nicholas o. davidson , division of gastroenterology, department of medicine, washington university school of medicine, st. louis, mo to whom communication should be addressed: email: nod@wustl.edu running title: apobec mediated c to u rna editing keywords: rna folding; a cf; rbm ; january , (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract ( words) mammalian c-to-u rna editing was described more than years ago as a single nucleotide modification in apob rna in small intestine, later shown to be mediated by the rna-specific cytidine deaminase apobec . reports of other examples of c-to-u rna editing, coupled with the advent of genome-wide transcriptome sequencing, identified an expanded range of apobec targets. here we analyze the cis-acting regulatory components of verified murine c- to-u rna editing targets, including nearest neighbor as well as flanking sequence requirements and folding predictions. we summarize findings demonstrating the relative importance of trans- acting factors (a cf, rbm ) acting in concert with apobec . using this information, we developed a multivariable linear regression model to predict apobec dependent c-to-u rna editing efficiency, incorporating factors independently associated with editing frequencies based on sanger-confirmed editing sites, which accounted for % of the observed variance. co- factor dominance was associated with editing frequency, with rnas targeted by both rbm and a cf observed to be edited at a lower frequency than rbm dominant targets. the model also predicted a composite score for available human c-to-u rna targets, which again correlated with editing frequency. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction mammalian c-to-u rna editing was identified as the molecular basis for human intestinal apob production more than three decades ago (chen et al. ; hospattankar et al. ; powell et al. ). a site-specific enzymatic deamination of c to u of apob mrna was originally considered the sole example of mammalian c-to-u rna editing, occurring at a single nucleotide in a kilobase transcript and mediated by an rna specific cytidine deaminase (apobec ) (teng et al. ). with the advent of massively parallel rna sequencing technology we now appreciate that apobec mediated rna editing targets hundreds of sites (rosenberg et al. ; blanc et al. ) mostly within ’ untranslated regions of mrna transcripts. this expanded range of targets of c-to-u rna editing prompted us to reexamine key functional attributes in the regulatory motifs (both cis-acting elements and trans-acting factors) that impact editing frequency, focusing primarily on data emerging from studies of mouse cell and tissue-specific c-to-u rna editing. earlier studies identified rna motifs (davies et al. ) contained within a -nucleotide segment flanking the edited cytidine base in vivo (in cell lines) or within nucleotides using s extracts from rat hepatoma cells (bostrom et al. ; driscoll et al. ). those, and other studies, established that apob rna editing reflects both the tissue/cell of origin as well as rna elements remote and adjacent to the edited base (bostrom et al. ; davies et al. ). a granular examination of the regions flanking the edited base in apob rna demonstrated a critical ’ sequence - , downstream of c , in which mutations reduced or abolished editing activity (shah et al. ). this ’ site, termed a “mooring sequence” was associated with a s- “editosome” complex (smith et al. ), which was both necessary and sufficient for site-specific apob rna editing and editosome assembly (backus and smith ). other cis-acting elements include a nucleotide spacer region between the edited cytidine and the mooring sequence, and also sequences ’ of the editing site that regulate editing efficiency (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (backus and smith ; backus et al. ) along with au-rich regions both ’ and ’ of the edited cytidine that together function in concert with the mooring sequence (hersberger and innerarity ). advances in our understanding of physiological apob rna editing emerged in parallel from both the delineation of key rna regions (summarized above) and also with the identification of components of the apob rna editosome (sowden et al. ). apobec , the catalytic deaminase (teng et al. ) is necessary for physiological c-to-u rna editing in vivo (hirano et al. ) and in vitro (giannoni et al. ). using the mooring sequence of apob rna as bait, two groups identified apobec complementation factor (a cf), an rna-binding protein sufficient in vitro to support efficient editing in presence of apobec and apob mrna (lellek et al. ; mehta et al. ). those findings reinforced the importance of both the mooring sequence and an rna binding component of the editosome in promoting apob rna editing. however, while a cf and apobec are sufficient to support in vitro apob rna editing, neither heterozygous (blanc et al. ) or homozygous genetic deletion of a cf impaired apob rna editing in vivo in mouse tissues (snyder et al. ), suggesting that an alternate complementation factor was likely involved. other work identified a homologous rna binding protein, rbm , that functioned to promote apob rna editing both in vivo and in vitro (fossat et al. ), and more recent studies utilizing conditional, tissue-specific deletion of a cf and rbm indicate that both factors play distinctive roles in apobec -mediated c-to-u rna editing, including apob as well as a range of other apobec targets (blanc et al. ). these findings together establish important regulatory roles for both cis-acting elements and trans-acting factors in c-to-u mrna editing. however, the majority of studies delineating cis- acting elements reflect earlier, in vitro experiments using apob mrna and relatively little is known regarding the role of cis-acting elements in tissue-specific c-to-u rna editing of other transcripts, in vivo. here we use statistical modeling to investigate the independent roles of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . candidate regulatory factors in mouse c-to-u mrna editing using data from in vivo studies from over editing sites in transcripts (meier et al. ; rosenberg et al. ; gu et al. ; blanc et al. ; rayon-estrada et al. ; snyder et al. ; blanc et al. ; kanata et al. ). we also examined these regulatory factors in known human mrna targets (chen et al. ; powell et al. ; skuse et al. ; mukhopadhyay et al. ; grohmann et al. ; schaefermeier and heinze ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . results descriptive data c-to-u rna editing sites were identified based on eight studies that met inclusion and exclusion criteria (meier et al. ; rosenberg et al. ; gu et al. ; blanc et al. ; rayon-estrada et al. ; snyder et al. ; blanc et al. ; kanata et al. ), representing distinct rna editing targets. % ( / ) of rna targets were edited at one chromosomal location (figure c) and % ( / ) of mrna targets were edited at both a single chromosomal location and also within a single tissue (figure d). the majority of editing sites occur in the ` untranslated region ( / ; %), with exonic editing sites the next most abundant subgroup ( / ; %, figure e). chromosome x harbors the highest number of editing sites ( / ; %), followed by chromosomes and ( / ; . % for both, supplemental figure ). / editing sites were confirmed by sanger sequencing, with a mean editing frequency of ± %. base content of sequences flanking edited and mutated cytidines au content was enriched (~ %) in nucleotides both immediately upstream and downstream of the edited cytidine across mouse rna editing targets (figure a and c). the average au content across the region nucleotides upstream to nucleotides downstream of the edited cytidine was ~ % ( - %). because apobec has been shown to be a dna mutator (harris et al. ; wolfe et al. ; wolfe et al. ), we determined the au content of the mutated deoxycytidine region flanking human dna targets (nik-zainal et al. ) to be ~ % at a site one nucleotide downstream of the edited base (figure b, c). the average au content in the sequence nucleotides upstream and nucleotides downstream of mutated deoxycytidines is % ( - . %). the average au content was % and % in nucleotides immediately upstream and downstream, respectively, of the targeted deoxycytidine in a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . subgroup of over dna editing events of the c to t type (nik-zainal et al. ), which is closer to the distribution found in c to u rna editing targets. these features suggest that au enrichment is an important component to editing function of apobec on both rna and dna targets, especially for the c/dc to u/dt change. factors influencing editing frequency regulatory-spacer-mooring cassette: we observed no significant associations between editing frequency and mismatches in motif a (r=- . , p=. ) or motif b (r=- . , p=. ) (supplemental figure ), while mismatches in motif c and d negatively impacted editing frequency (r=- . , p=. ) (motif d r=- . , p=. , figure b). au content of motif b showed a trend towards negative association with editing frequency (r=- . , p=. figure c), but au contents of motifs a (r= . , p=. ), c (r=- . , p=. ), and d (r=- . , p=. ) did not impact editing frequency (supplemental figure ). the abundance of g in motif c (r= . , p=. ), abundance of c in motif b (r= . , p=. ), and g/c fraction in motif c (r= . , p=. ) showed either significance or a trend to associations with editing frequency. the spacer sequence averaged ± nucleotides, ranging from to , with trend of association between length and editing frequency (r=- . , p=. ). the mean spacer sequence au content was ± %, with no association between editing frequency and au content (r=- . , p=. , supplemental figure ). however, g abundance (r=- . , p=. ) and g/c fraction (r=- . , p=. ) of spacer showed significant associations with editing frequency in sanger-confirmed targets. the mean number of mismatches in the first nucleotides of the spacer sequence was . ± with higher number of mismatches exerting a significant negative impact on editing frequency (r=- . , p=. ) (figure d). the mean number of mismatches in the mooring sequence was . ± . , ranging from to nucleotides. the number of mismatches showed a significant negative association with editing frequency (r=- . , p=. , figure e). the base content of individual nucleotides surrounding the edited cytidine showed significant associations with editing frequency, which (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . was more emphasized in nucleotides closer to the edited cytidine (figure f, supplemental table ). furthermore, overall au content of downstream sequence + to + had positive impact on editing frequency (r= . , p=. ) (supplemental figure ). however, g abundance in downstream nucleotides (r=- . , p=. ) and g/c fraction in downstream nucleotides (r=- . , p=. ) showed significant or a trend of significant negative associations with editing frequency in sanger-confirmed targets. secondary structure: we generated a predicted secondary structure for editing sites, with four subgroups based on overall structure and location of the edited cytidine: loop (cloop), stem (cstem), tail (ctail), and non-canonical structure (nc). the majority of editing sites were in the cloop subgroup ( %), followed by cstem ( %), ctail ( %), and nc ( %) subgroups (figure a). editing sites in the ctail subgroup exhibited lower editing frequencies compared to editing sites in cloop ( ± vs ± %, p=. ) or cstem ( ± %, p=. ) subgroups. no significant differences were detected in other comparisons (figure b). the edited cytidine was located in loop, stem, and tail of the secondary structure in ( %), ( %), and ( %) of the edited rnas, respectively. editing sites with the edited cytidine within the loop exhibited significantly higher editing frequency compared to those with the edited cytidine in the tail ( ± % vs ± %, p=. ). other subgroups exhibited comparable editing frequencies (supplemental figure ). the majority ( %) of editing sites contained a mooring sequence located in main stem-loop structure (figure c), with the remainder located in the tail or secondary loop. average editing efficiency was significantly higher in targets where the mooring sequence was located in the main stem-loop (figure d). we also calculated the proportion of total nucleotides that constitute the main stem-loop in the secondary structure. the average ratio was . ± . ranging from . to (supplemental table ) with higher ratios associated with higher editing frequency of the corresponding editing site (r= . , p=. ) (figure e). finally, we considered the orientation of free tails in the secondary structure in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . terms of length and symmetry. symmetric free tails were observed in % of editing sites (supplemental figure ). the length of ’ free tail showed negative association with editing frequency (r=- . , p=. , figure f) while no significant associations were detected between either the length of ’ tail or symmetry of tails and editing frequency (supplemental figure ). trans-acting factors and tissue specificity: data for relative dominance of cofactors in apobec - dependent rna editing were available for editing sites for targets in small intestine or liver (blanc et al. ). rbm was identified as the dominant factor in / ( %) sites; a cf was the dominant factor in / ( %) editing sites with the remaining sites ( / ; %), exhibiting equal codominancy (figure a). the average editing frequencies at editing sites revealed differences across the groups with ± % in rbm -dominant targets, ± % in a cf-dominant, and ± % in the co-dominant group (p=. ) (figure b). the majority of rna editing targets were edited in one tissue ( / ; % figure c), while the maximum number of tissues in which an editing target is edited (at the same site) is (cd ). the small intestine harbors the highest number of verified editing sites ( / ; %), followed by liver ( / ; %), and adipose tissue ( / ; % figure d). sites edited in brain tissue showed the highest average editing frequency ( ± %, n= ), followed by bone marrow myeloid cells ( ± %, n= ), and kidney ( ± %, n= figure e). we then developed a multivariable linear regression model to predict apobec dependent c- to-u rna editing efficiency, incorporating factors independently associated with editing frequencies (table ). this model, based on sanger-confirmed editing sites with available data for all of the parameters mentioned, accounted for % of variance in editing frequency of editing sites included (r = . , p<. table ). the final multivariable model revealed several factors independently associated with editing frequency, specifically the number of mismatches in mooring sequence; regulatory sequence motif d; au content of regulatory sequence motif b; overall secondary structure for group ctail vs group cloop; location of mooring sequence in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . secondary structure; “base content score” parameter that represents base content of the sequences flanking edited cytidine (table ). removing “base content score” from the model reduced the power from r = . to r = . . next, we added a co-factor dominance variable and fit the model using the editing sites with available data for cofactor dominance. along with other factors mentioned above, co-factor dominance showed significant association with editing frequency (table ) with rnas targeted by both rbm and a cf observed to be edited at a lower frequency than rbm dominant targets. factors associated with co-factor dominance (figure , supplemental table , supplemental figure ), included tissue-specificity, with higher frequency of rbm -dominant sites in small intestine compared to liver ( vs %, p=. ) and a cf-dominant and co-dominant editing sites more prevalent in liver. the number of mooring sequence mismatches also varied among three subgroups: . ± . in rbm -dominant subgroup; . ± . in a cf-dominant subgroup; and . ± . in co-dominant subgroup (p=. ). this was also the case regarding mismatches in the spacer: . ± . in rbm -dominant subgroup; . ± . in a cf-dominat subgroup; . ± . in co-dominant subgroup (p=. ). au content (%) of downstream sequence + to + was higher in rbm -dominant subgroup (p=. ). finally, the location of the edited cytidine in secondary structure of mrna strand was different across three subgroups (p=. , figure ). we used pairwise multinomial logistic regression to determine factors independently associated with co-factor dominance (figure c, supplemental table ). ctail editing sites, those with more mismatches in mooring and regulatory motif c, lower au content in downstream sequence, and higher au content in regulatory motif d were more likely co-dominant. editing sites from small intestine and those with higher au content of downstream sequence were more likely rbm -dominant. editing sites from liver and those with higher mismatches in regulatory motif b were more likely a cf-dominant (figure c). human mrna targets (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . finally, we turned to an analysis of human c-to-u rna editing targets for which this same panel of parameters was available (table ). aside from apob rna, which is known to be edited in the small intestine (chen et al. ; powell et al. ), other targets have been identified in central or peripheral nervous tissue (skuse et al. ; mukhopadhyay et al. ; meier et al. ; schaefermeier and heinze ). the human targets were categorized into low editing (nf , glyrα , glyrα ) and high editing (apob, tph b exon , tph b exon ) subgroups using % as cut-off. a composite score (maximum= ) was generated based on six parameters introduced in the mouse model with notable variance between the two subgroups including mismatches in mooring sequence, spacer length, location of the edited cytidine, and relative abundance of stem-loop bases (table ). high editing targets exhibited a significantly higher composite score ( . vs , p=. ) compared to low editing targets and the composite score significantly correlated with editing frequency in individual targets (r= . , p=. ). the canonical editing target apob (chen et al. ; powell et al. ) achieved a score of (out of ), reflecting the observation that one of the six parameters (au% of regulatory motifs) in human apob is non-preferential compared to the editing-promoting features identified in the mouse multivariable model. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . discussion the current study reflects our analysis of c-to-u rna editing sites from target mrnas, with the majority residing within the ’ untranslated region. our multivariable model identified several key factors influencing editing frequency, including host tissue, base content of nucleotides surrounding the edited cytidine, number of mismatches in regulatory and mooring sequences, au content of the regulatory sequence, overall secondary structure, location of the mooring sequence, and co-factor dominance. these factors, each exerting independent effects, together accounted for % of the variance in editing frequency. our findings also showed that mismatches in the mooring and regulatory sequences, au content of regulatory and downstream sequences, host tissue and secondary structure of target mrna were associated with the pattern of co-factor dominance. several aspects of these primary conclusions merit further discussion. previous studies investigating the key factors that regulate c-to-u mrna editing were confined to in vitro studies and predicated on a single mrna target (apob) (backus and smith ; shah et al. ; smith et al. ; backus and smith ; hersberger and innerarity ). with the expanded range of verified c-to-u rna editing targets now available for interrogation, we revisited the original assumptions to understand more globally the determinants of c-to-u mrna editing efficiency. in undertaking this analysis, we were reminded that the requirements for c-to-u mrna editing in vitro often appear more stringent than in vivo (backus and smith ; shah et al. ), which further emphasizes the importance of our findings. in addition, our approach included both cis-acting sequence- and folding-related predictions along with the role of trans-acting factors and took advantage of statistical modeling to adjust for confounding or modifier effects between these factors to identify their role in editing frequency. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we began with the assumptions established for apob rna editing which identified a nucleotide segment encompassing the edited base, spacer, mooring sequence, and part of regulatory sequence as the minimal sequence competent for physiological editing in vitro and in vivo (davies et al. ; shah et al. ; backus and smith ). those studies identified an -nucleotide mooring sequence as essential and sufficient for editosome assembly and site- specific c-to-u editing (backus and smith ; shah et al. ; backus and smith ) and established optimal positioning of the mooring sequence relative to the edited base in apob rna (backus and smith ). the current work supports the key conclusions of this original mooring sequence model as applied to the entire range of c-to-u rna editing targets. we observed that mismatches in either the mooring or regulatory sequences were independent factors governing editing frequency. by contrast, while mismatches in the spacer sequence also showed negative association with editing frequency, the impact of spacer mismatches were not retained in the final model, nor was the length of the spacer associated with editing frequency. furthermore, we found mismatches in the regulatory sequence motif c to be more important than mismatches in motif b. these inconsistencies might conceivably reflect the context in which an rna segment is studied (backus and smith ). for example, our analysis reflects physiological conditions in which naturally occurring mrna targets are edited, while the aforementioned study used in vitro data based on varying lengths of apob mrna embedded within different mrna contexts (apoe rna) (backus and smith ). in addition to the components of mooring sequence model, we examined variations in the base content in different segments/motifs as well as among individual nucleotides surrounding the edited cytidine. as expected, we found that sequences flanking the edited cytidine exhibited high au content. we further observed a similarly high au content in the flanking sequences of a range of proposed apobec-mediated dna mutation targets in human cancer tissues and cell lines (alexandrov et al. ; petljak et al. ), especially in targets with dc/dt change (nik- (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . zainal et al. ). this observation implies that apobec-mediated dna and rna editing frequency may each be functionally modified by au enrichment in the flanking sequences surrounding modifiable bases. the base content in individual nucleotides surrounding the edited cytidine also exerted significant impact on editing frequency, particularly in a - nucleotide segment spanning the edited cytidine (supplemental table ), accounting for % of the variance in editing frequency independent of the mooring sequence model. our findings regarding individual nucleotides surrounding the edited cytidine are consistent with findings for both dna and rna editing targets, particularly in the setting of cancers (backus and smith ; conticello ; roberts et al. ; saraconi et al. ; gao et al. ; arbab et al. ). recent work examining the sequence-editing relationship of a large in vitro library of dna targets edited by different synthetic cytidine base editor (cbe)s (arbab et al. ) showed that the base content of a -nucleotide window spanning the edited cytidine explained - % of the editing variance, in particular one or two nucleotides immediately ’ of the edited nucleotide. that study also demonstrated that occurrence of t and c nucleotides at the position - increased, while a g nucleotide at that position decreased editing frequency (arbab et al. ). however, in contrast to our findings, the presence of a at position - had either a negative or null effect on dna editing activity (arbab et al. ). this latter finding is consistent with the lower au content observed in nucleotides adjacent to the edited cytidine in apobec- dna targets compared to the au content in rna targets. our findings assign a greater importance of adjacent nucleotides in rna editing frequency, similar to earlier reports that the five bases immediately ’ of the edited cytidine in apob mrna exert a greater impact on editing activity compared to nucleotides further upstream of this segment (backus and smith ; shah et al. ; backus and smith ). g/c fraction of a -nucleotide window spanning the edited cytidine in dna targets is associated with editing activity of the synthetic cbes (arbab et al. ). although we found significant associations of rna editing with g/c fraction in segments surrounding the edited cytidine in univariate analyses, these associations (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . were not retained in the final model. in contrast, the au content of regulatory sequence motif b remained as an independent factor determining editing frequency in the final model. the conserved -nucleotide sequence around the edited c forms a stem-loop secondary structure, where the editing site is in an octa-loop (richardson et al. ) as predicted for the -nucleotide sequence of apob mrna (shah et al. ). this stem-loop structure is predicted to play an important role in recognition of the editing site by the editing factors (bostrom et al. ; davies et al. ; driscoll et al. ; chen et al. ). mutations resulting in loss of base pairing in peripheral parts of the stem did not impact the editing frequency (shah et al. ). editing sites with the cytidine located in central parts (e.g. loop) exhibited higher editing frequencies than those with the edited cytidine located in peripheral parts (e.g. tail) and it is worth noting that the computer-based stem-loop structure was independently confirmed by nmr studies of a -nucleotide human apob mrna (maris et al. ). those studies demonstrated that the location of the mooring sequence in the apob mrna secondary structure plays a critical role in the rna recognition by a cf (maris et al. ). in line with those findings, the current findings emphasize that the location of the mooring sequence in secondary structure of the target mrna exerts significant independent impact on editing frequency. these predictions were confirmed in crystal structure studies of the carboxyl-terminal domain of apobec- and its interaction with cofactors and substrate rna (wolfe et al. ). our conclusions regarding murine c-to-u editing frequency, such as mooring sequence, base content, and secondary structure appear consistent with a similar regulatory role among the smaller number of verified human targets. that being said, further study and expanded understanding of the range of c-to-u editing targets in human tissues will be needed as recently suggested (destefanis et al. ), analogous to that for a-to-i editing (bahn et al. ; bazak et al. ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we recognize that other factors likely contribute to the variance in rna editing frequency not covered by our model. we did not consider the role of naturally occurring variants in apobec , for example, which may be a relevant consideration since mutations in apobec family genes were shown to modify the editing activity of related hybrid dna cytosine base editors (arbab et al. ). furthermore, genetic variants of apobec in humans were associated with altered frequency of glyr editing (kankowski et al. ). other factors not included in our approach included entropy-related features, tertiary structure of the mrna target and other regulatory co-factors. another limitation in the tissue-specific designation used to categorize editing frequency is that cell specific features of editing frequency may have been overlooked. for example, small intestinal and liver preparations are likely a blend of cell types (macparland et al. ; elmentaite et al. ) and tumor tissues are highly heterogeneous in cellular composition (barker et al. ). the current findings provide a platform for future approaches to resolve these questions. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . materials and methods search strategy a comprehensive literature review from (when apob rna editing was first reported (chen et al. ; powell et al. )) to november , using studies published in english reporting c-to-u mrna editing frequencies of individual or transcriptome-wide target genes. databases searched included medline, scopus, web of science, google scholar, and proquest (for thesis). the references of full texts retrieved were also scrutinized for additional papers not indexed in the initial search. study selection primary records (n= ) were screened for relevance and in vivo studies reporting editing frequencies of individual or transcriptome-wide apobec -dependent c-to-u mrna targets selected, using a threshold of % editing frequency. for analyses based on rna sequence information, only targets with available sequence information or chromosomal location for the edited cytidine were included. exclusion criteria included: studies that reported c-to-u mrna editing frequencies of target genes in other species, studies reporting editing frequencies of target genes in animal models overexpressing apobec , exclusively in vitro studies, and conference abstracts. human targets we included studies reporting human c-to-u mrna targets (chen et al. ; powell et al. ; skuse et al. ; mukhopadhyay et al. ; grohmann et al. ; schaefermeier and heinze ). we also included work describing apobec -mediated mutagenesis in human breast cancer (nik-zainal et al. ). data extraction (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . two reviewers (ss and vb) conducted the extraction process independently and discrepancies were addressed upon consensus and input from a third reviewer (nod). the parameters were categorized as follows: general parameters: gene name (rna target), chromosomal and strand location of the edited cytidine, tissue site, editing frequency determined by rna-seq or sanger sequencing as illustrated for apob (figure a). editing frequency was highly correlated by both approaches (r= . p< . ), and where both methodologies were available we used rna- seq. we also defined relative dominance of editing co-factors (a cf-dominant, rbm - dominant, or co-dominant), relative mrna expression (edited gene vs unedited gene) by rna- seq or quantitative rt-pcr, and abundance of corresponding protein (edited gene vs unedited gene) by western blotting or proteomic comparison. co-factor dominancy was determined based on the relative contribution of each co-factor to editing frequency. in each editing site, editing frequencies in mouse tissues deficient in a cf or rbm were compared to that of wild- type mice. the relative contribution of each co-factor was calculated by subtracting the editing frequency for each target in a cf or rbm knockout tissue from the total editing frequency in wild-type control. editing sites with < % difference between contributions of rbm and a cf were considered co-dominant. sites with ≥ % difference were considered either rbm - or a cf-dominant, depending on the co-factor with higher contribution (blanc et al. ). sequence-related parameters: a sequence spanning nucleotides upstream and nucleotides downstream of the edited cytidine was extracted for each c-to-u mrna editing site. these sequences were extracted either directly from the full-text or using online ucsc genome browser on mouse (ncbi /mm ) and human (grch /hg ) (https://genome.ucsc.edu/cgi- bin/hggateway) . using the mooring sequence model (backus and smith ), three cis-acting elements were considered for each site. these elements included ) a -nucleotide segment immediately upstream of the edited cytidine as “regulatory sequence”; ) a -nucleotide segment downstream of the edited cytidine with complete or partial consensus with the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . canonical “mooring sequence” of apob mrna; ) the sequence between the edited cytidine and the ’ end of the mooring sequence, referred to as “spacer”. we used an unbiased approach to identify potential mooring sequences by taking the nearest segment to the edited cytidine with lowest number of mismatch(es) compared to the canonical mooring sequence of apob rna. for each of the three segments, we investigated the number of mismatches compared to the corresponding segment of apob gene (blanc et al. ), as well as length of spacer, the abundance of a and u nucleotides (au content) and the g to c abundance ratio (g/c fraction (arbab et al. )). we also calculated relative abundance of a, g, c, and u individually across a region nucleotides upstream and nucleotides downstream of the edited cytidine across all editing sites. for comparison, we examined the base content of a sequence spanning nucleotides upstream and downstream of mutated deoxycytidine for over proposed c to x (t, a, and g) dna mutation targets of apobec family in human breast cancer (nik-zainal et al. ) along with relative deoxynucleotide distribution in proximity to the edited site. secondary structure parameters: we used rna-structure (reuter and mathews ) and mfold (zuker ) to determine the secondary structure of an rna cassette consisting of regulatory sequence, edited cytidine, spacer, and mooring sequence. secondary structures similar to that of the cassette for apob chr : consisting of one loop and stem (with or without unassigned nucleotides with ≤ unpaired bases inside the stem) as the main stem-loop with or without free tail(s) in one or both ends of the stem were considered as canonical. two other types of secondary structure were considered as non-canonical structures (figure b), with ≥ loops located either at ends of the stem or inside the stem. loops inside the stem were circular open structures with ≥ unpaired bases. editing sites with canonical structure were further categorized into three subgroups based on location of the edited cytidine: specifically (cloop), stem (cstem), or tail (ctail). in addition to overall secondary structure, we considered (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . location of the edited cytidine, location of mooring sequence, symmetry of the free tails, and proportion of the nucleotides in the target cassette that constitute the main stem-loop. this proportion is . in the case of apob chr : where all the bases are part of the main stem-loop structure. symmetry was defined based on existence of free tails in both ends of the rna strand. statistical methodology continuous variables are reported as means ± sd with relative proportions for binary and categorical variables. t-test and anova tests were used to compare continuous parameters of interest between two or more than two groups, respectively. chi-squared testing was used to compare binary or categorical variables among different groups. pearson r testing was used to investigate correlation of two continuous variables. we used linear regression analyses to develop the final model of independent factors that correlate with editing frequency. we used the hosmer and lemeshow approach for model building (hosmer jr et al. ) to fit the multivariable regression model. in brief, we first used bivariate and/or simple regression analyses with p value of . as the cut-off point to screen the variables and detect primary candidates for the multivariable model. subsequently, we fitted the primary multivariable model using candidate variables from the screening phase. a backward elimination method was employed to reach the final multivariable model. parameters with p values < . or those that added to the model fitness were retained. next, the eliminated parameters were added back individually to the final model to determine their impact. plausible interaction terms between final determinants were also checked. the final model was screened for collinearity. we used the same approach to develop a multinomial logistic regression model to identify factors that were independently associated with co-factor dominance in rna editing sites. squared r and pseudo squared r were used to estimate the proportion of variance in responder parameter that could be explained by multivariable linear regression and multinomial logistic regression models, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . respectively. the same screening and retaining methods were used to investigate association of base content in a sequence nucleotides upstream and nucleotides downstream of the edited cytidine, with editing frequency. however, after determining the nucleotides that were retained in final regression model, a proxy parameter named “base content score” was calculated for each editing site based on the β coefficient values retrieved for individual nucleotides in the model. this parameter was used in the final model as representative variable for base content of the aforementioned sequence in each editing site. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . acknowledgments this work was supported by grants from the national institutes of health grants dk- , dk- , washington university digestive diseases research core center p dk- (to nod) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references ucsc genome browser on mouse (ncbi /mm ; ) and human (grch /hg ; ) assemblies. alexandrov lb, nik-zainal s, wedge dc, aparicio sa, behjati s, biankin av, bignell gr, bolli n, borg a, borresen-dale al et al. . signatures of mutational processes in human cancer. nature : - . arbab m, shen mw, mok b, wilson c, matuszek z, cassa ca, liu dr. . determinants of base editing outcomes from target library analysis and machine learning. cell : - e . backus jw, schock d, smith hc. . only cytidines ' of the apolipoprotein b mrna mooring sequence are edited. biochim biophys acta : - . backus jw, smith hc. . apolipoprotein b mrna sequences ' of the editing site are necessary and sufficient for editing and editosome assembly. nucleic acids res : - . -. . three distinct rna sequence elements are required for efficient apolipoprotein b (apob) rna editing in vitro. nucleic acids res : - . bahn jh, lee jh, li g, greer c, peng g, xiao x. . accurate identification of a-to-i rna editing in human by transcriptome sequencing. genome res : - . barker n, ridgway ra, van es jh, van de wetering m, begthel h, van den born m, danenberg e, clarke ar, sansom oj, clevers h. . crypt stem cells as the cells-of-origin of intestinal cancer. nature : - . bazak l, haviv a, barak m, jacob-hirsch j, deng p, zhang r, isaacs fj, rechavi g, li jb, eisenberg e et al. . a-to-i rna editing occurs at over a hundred million genomic sites, located in a majority of human genes. genome res : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . blanc v, henderson jo, newberry ep, kennedy s, luo j, davidson no. . targeted deletion of the murine apobec- complementation factor (acf) gene results in embryonic lethality. molecular and cellular biology : - . blanc v, park e, schaefer s, miller m, lin y, kennedy s, billing am, ben hamidane h, graumann j, mortazavi a et al. . genome-wide identification and functional analysis of apobec- -mediated c-to-u rna editing in mouse small intestine and liver. genome biol : r . blanc v, xie y, kennedy s, riordan jd, rubin dc, madison bb, mills jc, nadeau jh, davidson no. . apobec complementation factor (a cf) and rbm interact in tissue-specific regulation of c to u rna editing in mouse intestine and liver. rna : - . bostrom k, lauer sj, poksay ks, garcia z, taylor jm, innerarity tl. . apolipoprotein b rna editing in chimeric apolipoprotein eb mrna. j biol chem : - . chen sh, habib g, yang cy, gu zw, lee br, weng sa, silberman sr, cai sj, deslypere jp, rosseneu m et al. . apolipoprotein b- is the product of a messenger rna with an organ-specific in-frame stop codon. science : - . chen sh, li xx, liao ws, wu jh, chan l. . rna editing of apolipoprotein b mrna. sequence specificity determined by in vitro coupled transcription editing. j biol chem : - . conticello sg. . creative deaminases, self-inflicted damage, and genome evolution. annals of the new york academy of sciences : - . davies ms, wallis sc, driscoll dm, wynne jk, williams gw, powell lm, scott j. . sequence requirements for apolipoprotein b rna editing in transfected rat hepatoma cells. j biol chem : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . destefanis e, avsar g, groza p, romitelli a, torrini s, pir p, conticello sg, aguilo f, dassi e. . a mark of disease: how mrna modifications shape genetic and acquired pathologies. rna. driscoll dm, wynne jk, wallis sc, scott j. . an in vitro system for the editing of apolipoprotein b mrna. cell : - . elmentaite r, ross adb, roberts k, james kr, ortmann d, gomes t, nayak k, tuck l, pritchard s, bayraktar oa et al. . single-cell sequencing of developing human gut reveals transcriptional links to childhood crohn's disease. dev cell. fossat n, tourle k, radziewic t, barratt k, liebhold d, studdert jb, power m, jones v, loebel da, tam pp. . c to u rna editing mediated by apobec requires rna-binding protein rbm . embo rep : - . gao j, choudhry h, cao w. . apolipoprotein b mrna editing enzyme catalytic polypeptide-like family genes activation and regulation during tumorigenesis. cancer science : - . giannoni f, bonen dk, funahashi t, hadjiagapiou c, burant cf, davidson no. . complementation of apolipoprotein b mrna editing by human liver accompanied by secretion of apolipoprotein b . j biol chem : - . grohmann m, hammer p, walther m, paulmann n, buttner a, eisenmenger w, baghai tc, schule c, rupprecht r, bader m et al. . alternative splicing and extensive rna editing of human tph transcripts. plos one : e . gu t, buaas fw, simons ak, ackert-bicknell cl, braun re, hibbs ma. . canonical a-to-i and c-to-u rna editing is enriched at 'utrs and microrna target sites in multiple mouse tissues. plos one : e . harris rs, bishop kn, sheehy am, craig hm, petersen-mahrt sk, watt in, neuberger ms, malim mh. . dna deamination mediates innate immunity to retroviral infection. cell : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . hersberger m, innerarity tl. . two efficiency elements flanking the editing site of cytidine in the apolipoprotein b mrna support mooring-dependent editing. j biol chem : - . hirano k, young sg, farese rv, jr., ng j, sande e, warburton c, powell-braxton lm, davidson no. . targeted disruption of the mouse apobec- gene abolishes apolipoprotein b mrna editing and eliminates apolipoprotein b . j biol chem : - . hosmer jr dw, lemeshow s, sturdivant rx. . applied logistic regression. john wiley & sons. hospattankar av, higuchi k, law sw, meglin n, brewer hb, jr. . identification of a novel in-frame translational stop codon in human intestine apob mrna. biochem biophys res commun : - . kanata e, llorens f, dafou d, dimitriadis a, thune k, xanthopoulos k, bekas n, espinosa jc, schmitz m, marin-moreno a et al. . rna editing alterations define manifestation of prion diseases. proc natl acad sci u s a : - . kankowski s, forstera b, winkelmann a, knauff p, wanker ee, you xa, semtner m, hetsch f, meier jc. . a novel rna editing sensor tool and a specific agonist determine neuronal protein expression of rna-edited glycine receptors and identify a genomic apobec dimorphism as a new genetic risk factor of epilepsy. front mol neurosci : . lellek h, kirsten r, diehl i, apostel f, buck f, greeve j. . purification and molecular cloning of a novel essential component of the apolipoprotein b mrna editing enzyme- complex. j biol chem : - . macparland sa, liu jc, ma xz, innes bt, bartczak am, gage bk, manuel j, khuu n, echeverri j, linares i et al. . single cell rna sequencing of human liver reveals distinct intrahepatic macrophage populations. nat commun : . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . maris c, masse j, chester a, navaratnam n, allain fh. . nmr structure of the apob mrna stem-loop and its interaction with the c to u editing apobec complementary factor. rna : - . mehta a, kinter mt, sherman ne, driscoll dm. . molecular cloning of apobec- complementation factor, a novel rna-binding protein involved in the editing of apolipoprotein b mrna. mol cell biol : - . meier jc, henneberger c, melnick i, racca c, harvey rj, heinemann u, schmieden v, grantyn r. . rna editing produces glycine receptor alpha (p l), resulting in high agonist potency. nat neurosci : - . mukhopadhyay d, anant s, lee rm, kennedy s, viskochil d, davidson no. . c-->u editing of neurofibromatosis mrna occurs in tumors that express both the type ii transcript and apobec- , the catalytic subunit of the apolipoprotein b mrna-editing enzyme. am j hum genet : - . nik-zainal s, alexandrov lb, wedge dc, van loo p, greenman cd, raine k, jones d, hinton j, marshall j, stebbings la et al. . mutational processes molding the genomes of breast cancers. cell : - . petljak m, alexandrov lb, brammeld js, price s, wedge dc, grossmann s, dawson kj, ju ys, iorio f, tubio jmc et al. . characterizing mutational signatures in human cancer cell lines reveals episodic apobec mutagenesis. cell : - e . powell lm, wallis sc, pease rj, edwards yh, knott tj, scott j. . a novel form of tissue- specific rna processing produces apolipoprotein-b in intestine. cell : - . rayon-estrada v, harjanto d, hamilton ce, berchiche ya, gantman ec, sakmar tp, bulloch k, gagnidze k, harroch s, mcewen bs et al. . epitranscriptomic profiling across cell types reveals associations between apobec -mediated rna editing, gene expression outcomes, and cellular function. proc natl acad sci u s a : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . reuter js, mathews dh. . rnastructure: software for rna secondary structure prediction and analysis. bmc bioinformatics : . richardson n, navaratnam n, scott j. . secondary structure for the apolipoprotein b mrna editing site. au-binding proteins interact with a stem loop. j biol chem : - . roberts sa, lawrence ms, klimczak lj, grimm sa, fargo d, stojanov p, kiezun a, kryukov gv, carter sl, saksena g et al. . an apobec cytidine deaminase mutagenesis pattern is widespread in human cancers. nat genet : - . rosenberg br, hamilton ce, mwangi mm, dewell s, papavasiliou fn. . transcriptome- wide sequencing reveals numerous apobec mrna-editing targets in transcript ' utrs. nat struct mol biol : - . saraconi g, severi f, sala c, mattiuz g, conticello sg. . the rna editing enzyme apobec induces somatic mutations and a compatible mutational signature is present in esophageal adenocarcinomas. genome biol : . schaefermeier p, heinze s. . hippocampal characteristics and invariant sequence elements distribution of glra and glra c-to-u editing. mol syndromol : - . shah rr, knott tj, legros je, navaratnam n, greeve jc, scott j. . sequence requirements for the editing of apolipoprotein b mrna. j biol chem : - . skuse gr, cappione aj, sowden m, metheny lj, smith hc. . the neurofibromatosis type i messenger rna undergoes base-modification rna editing. nucleic acids res : - . smith hc, kuo sr, backus jw, harris sg, sparks ce, sparks jd. . in vitro apolipoprotein b mrna editing: identification of a s editing complex. proc natl acad sci u s a : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . snyder em, mccarty c, mehalow a, svenson kl, murray sa, korstanje r, braun re. . apobec complementation factor (a cf) is dispensable for c-to-u rna editing in vivo. rna : - . sowden m, hamm jk, spinelli s, smith hc. . determinants involved in regulating the proportion of edited apolipoprotein b rnas. rna : - . teng b, burant cf, davidson no. . molecular cloning of an apolipoprotein b messenger rna editing protein. science : - . wolfe ad, arnold db, chen xs. . comparison of rna editing activity of apobec -a cf and apobec -rbm complexes reconstituted in hek t cells. j mol biol : - . wolfe ad, li s, goedderz c, chen xs. . the structure of apobec and insights into its rna and dna substrate selectivity. nar cancer : zcaa . zuker m. . mfold web server for nucleic acid folding and hybridization prediction. nucleic acids res : - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table . multivariable linear regression model for determinant factors of editing frequency in mouse apobec -dependent c-to-u mrna editing sites. determinant of editing frequency subgroup ß ( % ci) p value model without co-factor group n= ; r = . ; p<. base content score per unit increments . [ . , . ] < . count of mismatches in mooring sequence per unit increments - . [- . , - . ] <. count of mismatches in regulatory sequence motif d (whole sequence) per unit increments - . [- . , - . ] . au content of regulatory sequence motif b per % increments - . [- . , - . ] . overall secondary structure c loop reference c stem . [- . , . ] . c tail - . [- . , - . ] . non-canonical - . [- . , - . ] . location of mooring sequence stem-loop reference other - . [- . , - . ] <. after adding co-factor group to the model n= ; r = . ; p<. co-factor group rbm dominant reference co-dominant - . [- . , - . ] . a cf dominant . [- . , . ] . ß: represents average change (%) in the editing frequency compared to the reference group ci: confidence interval (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table : characteristics of human c-to-u mrna editing targets parameter low editing high editing nf glycra glycra tph b tph b apob editing location c c c c (exon ) c (exon ) c tissue neural sheath / cns tumor hippocampus hippocampus amygdala amygdala small intestine editing frequency %) > mismatches in regulatory motif a mismatches in regulatory motif b mismatches in regulatory motif c mismatches in regulatory motif d au content (%) in regulatory motif a au content (%) in regulatory motif b au content (%) in regulatory motif c* au content (%) in regulatory motif d spacer length* spacer au content (%) mismatches in spacer mismatches in mooring* au content (%) of downstream bases* au content (%) of downstream bases overall secondary structure canonical canonical canonical canonical canonical canonical location of edited c* loop tail tail stem loop loop location of mooring sequence stem-loop stem-loop stem-loop stem-loop stem-loop stem-loop ratio of stem-loop bases* . . . . . . free tail orientation symmetric symmetric asymmetric symmetric asymmetric asymmetric composite score cns: central nervous system * these items were used to calculate the composite score (total score = ) as follows: au content (%) in regulatory motif c: < %: , ≥ %: spacer length: ≤ : , > : mismatches in mooring: < : , ≥ : au content (%) of downstream bases: > %: , ≤ %: location of edited c in secondary structure: stem-loop: , tail: ratio of stem-loop bases: > %: , ≤ %: (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure legends figure . characteristics of murine apobec -mediated c-to-u mrna editing sites. a: schematic presentation of mrna target, chromosomal editing location, and editing sites considered. each mrna target could be edited at one or more chromosomal location(s) (blue boxes). each editing location could be edited in one or more tissues giving rise to one or more editing site(s) per location (green boxes). editing site(s) of each mrna target are the sum of editing sites from all editing locations reported for that target. b: examples of canonical (apob chr : , top) and two types of non-canonical (kctd chr : and dcn chr : ) secondary structures. c: distribution of number of chromosomal editing location(s), or targeted cytidine(s), per mrna target. d: distribution of number of total editing sites per mrna target considering all chromosomal location(s) edited at different tissue(s). e: distribution of location of editing sites within gene structure. figure . base content of sequences flanking modified cytidine in rna editing and dna mutation targets. a: base content of nucleotides upstream and nucleotides downstream of edited cytidine in mouse apobec -mediated c-to-u mrna editing targets. b: base content of nucleotides upstream and nucleotides downstream of mutated cytidine in proposed human apobec-mediated dna mutation targets in patients with breast cancer. c: comparison of au base content (%) of nucleotides flanking modified cytidine in rna editing targets and dna mutation targets in mouse and human breast cancer patients, respectively. figure . characteristics of regulatory-spacer-mooring cassette and base content of individual nucleotides flanking edited cytidine in association with editing frequency. a: schematic illustration of regulatory-spacer-mooring cassette. four motifs were defined for (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . regulatory sequence: motif a for nucleotides - to - ; motif b for nucleotides - to - ; motif c for nucleotides - to - ; motif d representative of the whole sequence. b: association of the mismatches in motif d of regulatory sequence with editing frequency. c: association between the au content (%) of regulatory sequence (motif b) and editing frequency. d: association of the mismatches in spacer (nucleotides + to + downstream of the edited cytidine) with editing frequency. e: association of the mismatches in mooring sequence with editing frequency. f: heatmap plot illustrating the association between base content of nucleotides flanking the edited cytidine with editing frequency. red color density in each cell represents the beta coefficient value of corresponding base in the multivariable linear regression model fit including that nucleotide. the asteriska refer to the nucleotides that were retained in the final model. mismatches in regulatory, spacer, and mooring sequences were determined in comparison to the corresponding sequences in apob mrna (as reference). r: pearson correlation coefficient. figure . secondary structure-related features in association with editing frequency. a: distribution of different types of overall secondary structure in editing sites. c loop, c stem, c tail are three subtypes of canonical secondary structure based on the location of the edited cytidine. b: association between type of secondary structure and editing frequency. c: distribution of the mooring sequence location in editing sites. “other” refers to mooring sequences located in tail or stem/loop and not part of the main stem-loop structure. d: association of mooring sequence location with editing frequency. e: association between ratio of main stem-loop bases to total bases count and editing frequency. f: association of the ’ free tail length with editing frequency. * p<. ; ** p<. . r: pearson correlation coefficient. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . dominance and tissue-specific cofactor patterns among editing sites. a: distribution of dominant co-factor in editosomes of editing sites. b: association of dominant co- factor with editing frequency. c: distribution of number of editing tissue(s) per mrna target. d: tissue distribution of editing sites. e: average editing frequency of editing sites edited at different tissues. si, small intestine. figure . co-factor pattern and tissue-specific role in murine c-to-u mrna editing sites. a: distribution of editing tissue across subgroups of editing sites with different dominant co- factor patterns. b: location of edited cytidine in secondary structure of editing sites with different dominant co-factor patterns. c: schematic presentation of factors that correlate with dominant co-factor pattern in editing sites. this graph is based on the findings derived from pairwise multinomial logistic regression models. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplemental figure legends supplemental figure . chromosomal distribution of murine apobec -mediated c-to-u mrna editing sites. the black curve corresponds to left y-axis and represents average editing frequencies of editing sites related to each chromosome. the blue curve corresponds to right y axis and represents number of editing sites related to each chromosome. supplemental figure . association of editing frequency with characteristics of regulatory sequence in murine apobec -mediated c-to-u mrna editing sites. a-c. association of editing frequency with number of mismatches and au content (%). d-f association of editing frequency with different regulatory sequence motifs. mismatches were determined in comparison to the same regulatory sequence motif in apob mrna (as reference). supplemental figure . association of editing frequency with characteristics of downstream sequence in murine apobec -mediated c-to-u mrna editing sites. a. association of editing frequency with spacer length. b. association of editing frequency with spacer au content (%). c-f. association of editing frequency with and au content of successive segments downstream of the edited cytidine. supplemental figure . association of editing frequency with secondary structure- related characteristics in c-to-u mrna editing sites. a: distribution of edited cytidine location in secondary structure regardless of the overall secondary structure. b: association of editing frequency with edited cytidine location in secondary structure. c: distribution of free tail (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . orientation in editing sites. d: association of editing frequency with free tail orientation in editing sites. e: association of editing frequency with ’ free tail length. * p<. ; *** p<. . r: pearson correlation coefficient. supplemental figure . association of secondary structure-related characteristics with dominant co-factor pattern in apobec -mediated c-to-u mrna editing sites. a. distribution of mooring sequence location presented in the context of different dominant co- factor patterns. b. distribution of free tail orientation in secondary structure among editing sites, presented in the context of different dominant co-factor patterns. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplemental table . multivariable linear regression model for individual nucleotides surrounding edited cytosine (- to + ) in mouse apobec -dependent c-to-u mrna editing sites. location of nucleotide relative to edited c base preference ß ( % ci) p value nucleotide - gu . [ . , . ] . nucleotide - c . [ . , . ] . nucleotide - g . [ . , . ] . nucleotide - u . [ . , . ] . nucleotide - auc . [ . , . ] < . nucleotide - au . [ . , . ] . nucleotide + agu . [ . , . ] < . nucleotide + g . [ . , . ] < . nucleotide + g . [ . , . ] < . nucleotide + c . [ . , . ] . nucleotide + g . [ . , . ] . nucleotide + auc . [ . , . ] . nucleotide + ac . [ . , . ] . nucleotide + au . [ . , . ] . nucleotide + au . [ . , . ] . nucleotide + ac . [ . , . ] . ß: represents average change (%) in the editing frequency compared to the reference group (non- preferred group) ci: confidence interval (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplemental table . descriptive data of regulatory-spacer-mooring cassette in mouse apobec - dependent c-to-u mrna editing sites. parameter n mean sd min max sequence-related features mismatches in regulatory (motif a) . . mismatches in regulatory (motif b) . . mismatches in regulatory (motif c) . . mismatches in regulatory (motif d) . . au content (%) of regulatory (motif a) . . au content (%) of regulatory (motif b) . . au content (%) of regulatory (motif c) . . au content (%) of regulatory (motif d) . . spacer length . . mismatches in spacer . . au content (%) of spacer . . mismatches in mooring . . au content (%) of downstream sequence + to + . . au content (%) of downstream sequence + to + . . au content (%) of downstream sequence + to + . . au content (%) of downstream sequence + to + . . secondary structure-related features proportion of the bases that constitute main stem- loop . . . length of ’ free tail . . length of ’ free tail . . sd: standard deviation (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplemental table . comparing three subgroups of mouse apobec -dependent c-to-u mrna editing sites based on co-factor dominance. parameter rbm -dominant a cf-dominant co-dominant p value n mean sd n mean sd n mean sd mismatches in regulatory (motif a) . . . . . . . mismatches in regulatory (motif b) . . . . . . . mismatches in regulatory (motif c) . . . . . . . mismatches in regulatory (motif d) . . . . . . . au content (%) of regulatory (motif a) . . . . . . . au content (%) of regulatory (motif b) . . . . . . . au content (%) of regulatory (motif c) . . . . . . . au content (%) of regulatory (motif d) . . . . . . . spacer length . . . . . . . mismatches in spacer (in -base cassette) . . . . . . . mismatches in spacer (relative abundance (%)) . . . . . . . au content (%) of spacer . . . . . . . mismatches in mooring . . . . . . . au content (%) of downstream sequence + to + . . . . . . . au content (%) of downstream sequence + to + . . . . . . . au content (%) of downstream sequence + to + . . . . . . . au content (%) of downstream sequence + to + . . . . . . . proportion of the bases that constitute main stem-loop . . . . . . . length of ’ free tail . . . . . . . length of ’ free tail . . . . . . . sd: standard deviation (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplemental table . multinomial logistic regression model for determinant factors of co-factor dominancy in mouse apobec -dependent c-to-u mrna editing sites. determinant of co-factor dominancy subgroup coefficient ( % ci) p value a cf-dominant vs rbm -dominant tissue small intestine reference liver . [ . , . ] . location of edited cytosine loop reference stem - . [- . , . ] . tail - . [- . , - . ] < . mismatches in mooring sequence per unit increments . [- . , . ] . mismatches in regulatory sequence motif b per unit increments . [ . , . ] . mismatches in regulatory sequence motif c per unit increments . [- . , . ] . au content (%) of regulatory sequence motif d per unit increments . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , - . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . co-dominant vs rbm -dominant tissue small intestine reference liver - . [- . , . ] . location of edited cytosine in secondary structure c loop reference c stem . [- . , . ] . c tail . [ . , . ] . mismatches in mooring sequence per unit increments . [ . , . ] . mismatches in regulatory sequence motif b per unit increments - . [- . , - . ] . mismatches in regulatory sequence motif c per unit increments . [ . , . ] . au content (%) of regulatory sequence motif d per unit increments . [ . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , - . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , - . ] . co-dominant vs a cf -dominant tissue small intestine reference liver - . [- . , - . ] . location of edited cytosine in secondary structure c loop reference c stem . [ . , . ] . c tail . [ . , . ] < . mismatches in mooring sequence per unit increments . [- . , . ] . mismatches in regulatory sequence motif b per unit increments - . [- . , - . ] . mismatches in regulatory sequence motif c per unit increments . [ . , . ] . au content (%) of regulatory sequence motif d per unit increments - . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . au content (%) of downstream sequence + to + per unit increments - . [- . , . ] . model parameters: n= ; pseudo r = . ; p<. ci: confidence interval (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . evaluating the transcriptional fidelity of cancer models da peng *, rachel gleyzer *, wen-hsin tai , pavithra kumar , qin bian , bradley issacs , edroaldo lummertz da rocha , stephanie cai , kathleen dinapoli , , franklin w huang , patrick cahan , , department of biomedical engineering, johns hopkins university school of medicine, baltimore md usa institute for cell engineering, johns hopkins university school of medicine, baltimore md usa department of microbiology, immunology and parasitology, federal university of santa catarina, florianópolis sc, brazil department of cell biology, johns hopkins university school of medicine, baltimore, md usa department of electrical and computer engineering, johns hopkins university, baltimore md usa division of hematology/oncology, department of medicine; helen diller family cancer center; bakar computational health sciences institute; institute for human genetics; university of california, san francisco, san francisco, ca department of molecular biology and genetics, johns hopkins university school of medicine, baltimore md usa * these authors made equal contributions. correspondence to: patrick.cahan@jhmi.edu article type: research website: http://www.cahanlab.org/resources/cancercellnet_web code: https://github.com/pcahan /cancercellnet .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract background: cancer researchers use cell lines, patient derived xenografts, engineered mice, and tumoroids as models to investigate tumor biology and to identify therapies. the generalizability and power of a model derives from the fidelity with which it represents the tumor type under investigation, however, the extent to which this is true is often unclear. the preponderance of models and the ability to readily generate new ones has created a demand for tools that can measure the extent and ways in which cancer models resemble or diverge from native tumors. methods: we developed a machine learning based computational tool, cancercellnet, that measures the similarity of cancer models to naturally occurring tumor types and subtypes, in a platform and species agnostic manner. we applied this tool to cancer cell lines, patient derived xenografts, distinct genetically engineered mouse models, and tumoroids. we validated cancercellnet by application to independent data, and we tested several predictions with immunofluorescence. results: we have documented the cancer models with the greatest transcriptional fidelity to natural tumors, we have identified cancers underserved by adequate models, and we have found models with annotations that do not match their classification. by comparing models across modalities, we report that, on average, genetically engineered mice and tumoroids have higher transcriptional fidelity than patient derived xenografts and cell lines in four out of five tumor types. however, several patient derived xenografts and tumoroids have classification scores that are on par with native tumors, highlighting both their potential as faithful model classes and their heterogeneity. conclusions: cancercellnet enables the rapid assessment of transcriptional fidelity of tumor models. we have made cancercellnet available as freely downloadable software and as a web application that can be applied to new cancer models that allows for direct comparison to the cancer models evaluated here. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction models are widely used to investigate cancer biology and to identify potential therapeutics. popular modeling modalities are cancer cell lines (ccls) , genetically engineered mouse models (gemms) , patient derived xenografts (pdxs) , and tumoroids . these classes of models differ in the types of questions that they are designed to address. ccls are often used to address cell intrinsic mechanistic questions , gemms to chart progression of molecularly defined-disease , and pdxs to explore patient-specific response to therapy in a physiologically relevant context . more recently, tumoroids have emerged as relatively inexpensive, physiological, in vitro d models of tumor epithelium with applications ranging from measuring drug responsiveness to exploring tumor dependence on cancer stem cells. models also differ in the extent to which the they represent specific aspects of a cancer type . even with this intra- and inter-class model variation, all models should represent the tumor type or subtype under investigation, and not another type of tumor, and not a non-cancerous tissue. therefore, cancer- models should be selected not only based on the specific biological question but also based on the similarity of the model to the cancer type under investigation , . various methods have been proposed to determine the similarity of cancer models to their intended subjects. domcke et al devised a 'suitability score' as a metric of the molecular similarity of ccls to high grade serous ovarian carcinoma based on a heuristic weighting of copy number alterations, mutation status of several genes that distinguish ovarian cancer subtypes, and hypermutation status . other studies have taken analogous approaches by either focusing on transcriptomic or ensemble molecular profiles (e.g. transcriptomic and copy number alterations) to quantify the similarity of cell lines to tumors – . these studies were tumor-type specific, focusing on ccls that model, for example, hepatocellular carcinoma or breast cancer. notably, yu et al compared the transcriptomes of ccls to the cancer genome atlas (tcga) by correlation analysis, resulting in a panel of ccls recommended as most representative of tumor types . most recently, najgebauer et al and salvadores et al .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / have developed methods to assess ccls using molecular traits such as copy number alterations (cna), somatic mutations, dna methylation and transcriptomics. while all of these studies have provided valuable information, they leave two major challenges unmet. the first challenge is to determine the fidelity of gemms, pdxs, and tumoroids, and whether there are stark differences between these classes of models and ccls. the other major unmet challenge is to enable the rapid assessment of new, emerging cancer models. this challenge is especially relevant now as technical barriers to generating models have been substantially lowered , , and because new models such as pdxs and tumoroids can be derived on patient-specific basis therefore should be considered a distinct entity requiring individual validation , . to address these challenges, we developed cancercellnet (ccn), a computational tool that uses transcriptomic data to quantitatively assess the similarity between cancer models and naturally occurring tumor types and subtypes in a platform- and species-agnostic manner. here, we describe ccn’s performance, and the results of applying it to assess ccls, pdxs, gemms, and tumoroids. this has allowed us to identify the most faithful models currently available, to document cancers underserved by adequate models, and to find models with inaccurate tumor type annotation. moreover, because ccn is open-source and easy to use, it can be readily applied to newly generated cancer models as a means to assess their fidelity. results cancercellnet classifies samples accurately across species and technologies previously, we had developed a computational tool using the random forest classification method to measure the similarity of engineered cell populations to their in vivo counterparts based on transcriptional profiles , . more recently, we elaborated on this approach to allow for classification of single cell rna-seq data in a manner that allows for cross-platform and cross-species analysis . here, we used an analogous approach to build a .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / platform that would allow us to quantitatively compare cancer models to naturally occurring patient tumors (fig a). in brief, we used tcga rna-seq expression data from solid tumor types to train a top-pair multi-class random forest classifier (fig b). we combined training data from rectal adenocarcinoma (read) and colon adenocarcinoma (coad) into one coad_read category because read and coad are considered to be virtually indistinguishable at a molecular level . we included an ‘unknown’ category trained using randomly shuffled gene-pair profiles generated from the training data of tumor types to identify query samples that are not reflective of any of the training data. to estimate the performance of ccn and how it is impacted by parameter variation, we performed a parameter sweep with a -fold / cross-validation strategy (i.e. / of the data sampled across each cancer type was used to train, / was used to validate) (fig c). the performance of ccn, as measured by the mean area under the precision recall curve (auprc), did not fall below . and remained relatively stable across parameter sets (supp fig a). the optimal parameters resulted in , features. the mean auprcs exceeded . in most tumor types with this optimal parameter set (fig d, supp fig b). the auprcs of ccn applied to independent data rna-seq data from tumors across five tumor types from the international cancer genome consortium (icgc) ranged from . to . , supporting the notion that the platform is able to accurately classify tumor samples from diverse sources (fig e). as one of the central aims of our study is to compare distinct cancer models, including gemms, our method needed to be able to classify samples from mouse and human samples equivalently. we used the top-pair transform to achieve this and we tested the feasibility of this approach by assessing the performance of a normal (i.e. non-tumor) cell and tissue classifier trained on human data as applied to mouse samples. consistent with prior applications , we found that the cross-species classifier performed well, achieving mean auprc of . when applied to mouse data (supp fig c). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / to evaluate cancer models at a finer resolution, we also developed an approach to perform tumor subtype classifications (supp fig d). we constructed different cancer subtype classifiers based on the availability of expression or histological subtype information , – . we also included non-cancerous, normal tissues as categories for several subtype classifiers when sufficient data was available: breast invasive carcinoma (brca), coad_read, head and neck squamous cell carcinoma (hnsc), kidney renal clear cell carcinoma (kirc) and uterine corpus endometrial carcinoma (ucec). the subtype classifiers all achieved high overall average auprs ranging from . to . (supp fig e). fidelity of cancer cell lines having validated the performance of ccn, we then used it to determine the fidelity of ccls. we mined rna-seq expression data of different cell lines across cancer types from the cancer cell line encyclopedia (ccle) and applied ccn to them, finding a wide classification range for cell lines of each tumor type (fig a, supp tab ). to verify the classification results, we applied ccn to expression profiles from ccle generated through microarray expression profiling . to ensure that ccn would function on microarray data, we first tested it by applying a ccn classifier created to test microarray data to expression profiles of tumor types. the cross-platform ccn classifier performed well, based on the comparison to study-provided annotation, achieving a mean auprc of . (supp fig a). next, we applied this cross-platform classifier to microarray expression profiles from ccle (supp fig b). from the classification results of cell lines that have both rna-seq and microarray expression profiles, we found a strong overall positive association between the classification scores from rna-seq and those from microarray (supp fig c). this comparison supports the notion that the classification scores for each cell line are not artifacts of profiling methodology. moreover, this comparison shows that the scores are consistent between the times that the cell lines were first assayed by microarray expression profiling in and by .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / rna-seq in . we also observed high level of correlation between our analysis and the analysis done by yu et al (supp fig d), further validating the robustness of the ccn results. next, we assessed the extent to which ccn classifications agreed with their nominal tumor type of origin, which entailed translating quantitative ccn scores to classification labels. to achieve this, we selected a decision threshold that maximized the macro f measure, harmonic mean of precision and recall, across cross validations. then, we annotated cell lines based their ccn score profile as follows. cell lines with ccn scores > threshold for the tumor type of origin were annotated as 'correct'. cell lines with ccn scores > threshold in the tumor type of origin and at least one other tumor type were annotated as 'mixed'. cell lines with ccn scores > threshold for tumor types other than that of the cell line's origin were annotated as 'other'. cell lines that did not receive a ccn score > threshold for any tumor type were annotated as 'none' (fig b). we found that majority of cell lines originally annotated as breast invasive carcinoma (brca), cervical squamous cell carcinoma and endocervical adenocarcinoma (cesc), skin cutaneous melanoma (skcm), colorectal cancer (coad_read) and sarcoma (sarc) fell into the 'correct' category (fig b). on the other hand, no esophageal carcinoma (esca), pancreatic adenocarcinoma (paad) or brain lower grade glioma (lgg) were classified as 'correct', demonstrating the need for more transcriptionally faithful cell lines that model those general cancer types. there are several possible explanations for cell lines not receiving a 'correct' classification. one possibility is that the sample was incorrectly labeled in the study from which we harvested the expression data. consistent with this explanation, we found that colorectal cancer line nci-h , , a cell line labelled as liver hepatocellular carcinoma (lihc) by ccle, was classified strongly as coad_read (supp tab ). another possibility to explain low ccn score is that cell lines were derived from subtypes of tumors that are not well-represented in tcga. to explore this hypothesis, we first performed tumor subtype classification on ccls from tumor types for which we had trained subtype classifiers (supp tab ). we reasoned that if .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / a cell was a good model for a rarer subtype, then it would receive a poor general classification but a high classification for the subtype that it models well. therefore, we counted the number of lines that fit this pattern. we found that of the lines with no general classification, ( %) were classified as a specific subtype, suggesting that derivation from rare subtypes is not the major contributor to the poor overall fidelity of ccls. another potential contributor to low scoring cell lines is intra-tumor stromal and immune cell impurity in the training data. if impurity were a confounder of ccn scoring, then we would expect a strong positive correlation between mean purity and mean ccn classification scores of ccls per general tumor type. however, the pearson correlation coefficient between the mean purity of general tumor type and mean ccn classification scores of ccls in the corresponding general tumor type was low ( . ), suggesting that tumor purity is not a major contributor to the low ccn scores across ccls (supp fig e). comparison of skcm and gbm ccls to scrna-seq to more directly assess the impact of intra-tumor heterogeneity in the training data on evaluating cell lines, we constructed a classifier using cell types found in human melanoma and glioblastoma scrna-seq data , . previously, we have demonstrated the feasibility of using our classification approach on scrna-seq data . our scrna-seq classifier achieved a high average auprc ( . ) when applied to held-out data and high mean auprc ( . ) when applied to few purified bulk testing samples (supp fig a-b). comparing the ccn score from bulk rna-seq general classifier and scrna-seq classifier, we observed a high level of correlation (pearson correlation of . ) between the skcm ccn classification scores and scrna-seq skcm malignant ccn classification scores for skcm cell lines (fig c, supp fig c). of the skcm cell lines that were classified as skcm by the bulk classifier, were also classified as skcm malignant cells by the scrna-seq classifier. interestingly, we also observed a high correlation between the sarc ccn classification score and scrna-seq cancer .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / associated fibroblast (caf) ccn classification scores (pearson correlation of . ). six of the seven skcm cell lines that had been classified as exclusively sarc by ccn were classified as caf by the scrna-seq classifier (fig d, supp fig c), which suggests the possibility that these cell lines were derived from caf or other mesenchymal populations, or that they have acquired a mesenchymal character through their derivation. the high level of agreement between scrna-seq and bulk rna-seq classification results shows that heterogeneity in the training data of general ccn classifier has little impact in the classification of skcm cell lines. in contrast, we observed a weaker correlation between gbm ccn classification scores and scrna-seq gbm neoplastic ccn classification scores (pearson correlation of . ) for gbm cell lines (fig e, supp fig d). of the gbm lines that were not classified as gbm with ccn, were classified as gbm neoplastic cells with the scrna-seq classifier. among the gbm lines that were classified as sarc with ccn, cell lines were classified as caf (fig f), which were classified as both gbm neoplastic and caf in the scrna-seq classifier. similar to the situation with skcm lines that classify as caf, this result is consistent with the possibility that some gbm lines classified as sarc by ccn could be derived from mesenchymal subtypes exhibiting both strong mesenchymal signatures and glioblastoma signatures or that they have acquired a mesenchymal character through their derivation. the lower level of agreement between scrna-seq and bulk rna-seq classification results for gbm models suggests that the heterogeneity of glioblastomas can impact the classification of gbm cell lines, and that the use of scrna-seq classifier can resolve this deficiency. immunofluorescence confirmation of ccn predictions to experimentally explore some of our computational analyses, we performed immunofluorescence on three cell lines that were not classified as their labelled categories: the ovarian cancer line sk-ov- had a high ucec ccn score ( . ), the ovarian cancer line a had a high testicular germ cell tumors (tgct) ccn score ( . ), and the prostate .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cancer line pc- had a high bladder cancer (blca) score ( . ) (supp tab ). we reasoned that if sk-ov- , a and pc- were classified most strongly as ucec, tgct and blca, respectively, then they would express proteins that are indicative of these cancer types. first, we measured the expression of the uterine-associated transcription factor hoxb , , and the ucec serous ovarian tumor biomarker wt in sk-ov- , in the ov cell line caov- , and in the ucec cell line hec- . we chose caov- as our positive control for ov biomarker expression because it was determined by our analysis and others , to be a good model of ov. likewise, we chose hec- to be a positive control for ucec. we found that sk- ov- has a small percentage ( %) of cells that expressed the uterine marker hoxb and a large proportion ( %) of cells that expressed wt (fig a). in contrast, no caov- cells expressed hoxb , whereas % of cells expressed wt . this suggests that sk-ov- exhibits both biomarkers of ovarian tumor and uterine tissue. from our computational analysis and experimental validation, sk-ov- is most likely an endometrioid subtype of ovarian cancer. this result is also consistent with prior classification of sk-ov- , and the fact that sk-ov- lacks p mutations, which is prevalent in high-grade serous ovarian cancer , and it harbors an endometrioid-associated mutation in arid a , , . next, we measured the expression of markers of ov and germ cell cancers (lin a ) in the ov-annotated cell line a , which received a high tcgt ccn score. we found that % of a cells expressed lin a whereas it was not detected in caov- (fig b). the ov marker wt was also expressed in fewer a cells as compared to caov- ( % vs %), which suggests that a could be a germ cell derived ovarian tumor. taken together, our results suggest that sk-ov- and a could represent ov subtypes of that are not well represented in tcga training data, which resulted in a low ov score and higher ccn score in other categories. lastly, we examined pc- , annotated as a prad cell line but classified to be most similar to blca. we found that % of the pc- cells expressed pparg, a contributor to urothelial differentiation that is not detected in the prad vcap cell line but is highly expressed .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / in the blca rt cell line (fig c). pc- cells also expressed the prad biomarker folh suggesting that pc- has an prad origin and gained urothelial or luminal characteristics through the derivation process. in short, our limited experimental data support the ccn classification results. subtype classification of cancer cell lines next, we explored the subtype classification of ccls from three general tumor types in more depth. we focused our subtype visualization (fig a-c) on ccl models with general ccn score above . in their nominal cancer type as this allowed us to analyze those models that fell below the general threshold but were classified as a specific sub-type (supp tab - ). focusing first on ucec, the histologically defined subtypes of ucec, endometrioid and serous, differ in prevalence, molecular properties, prognosis, and treatment. for instance, the endometrioid subtype, which accounts for approximately % of uterine cancers, retains estrogen receptor and progesterone receptor status and is responsive towards progestin therapy , . serous, a more aggressive subtype, is characterized by the loss of estrogen and progesterone receptor and is not responsive to progestin therapy , . ccn classified the majority of the ucec cell lines as serous except for jhuem- which is classified as mixed, with similarities to both endometrioid and serous (fig a). the preponderance ccle lines of serous versus endometroid character may be due to properties of serous cancer cells that promote their in vitro propagation, such as upregulation of cell adhesion transcriptional programs . some of our subtype classification results are consistent with prior observations. for example, hec- a, hec- b, and kle were previously characterized as type ii endometrial cancer, which includes a serous histological subtype . on the other hand, our subtype classification results contradict prior observations in at least one case. for instance, the ishikawa cell line was derived from type i endometrial cancer (endometrioid histological subtype) , , however ccn classified a derivative of this line, ishikawa er-, as serous. the high serous ccn score .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / could result from a shift in phenotype of the line concomitant with its loss of estrogen receptor (er) as this is a distinguishing feature of type ii endometrial cancer (serous histological subtype) . taken together, these results indicate a need for more endometroid-like ccls. next, we examined the subtype classification of lung squamous cell carcinoma (lusc) and lung adenocarcinoma (luad) cell lines (fig b-c). all the lusc lines with at least one subtype classification had an underlying primitive subtype classification. this is consistent either with the ease of deriving lines from tumors with a primitive character, or with a process by which cell line derivation promotes similarity to more primitive subtype, which is marked by increased cellular proliferation . some of our results are consistent with prior reports that have investigated the resemblance of some lines to lusc subtypes. for example, hcc- , previously been characterized as classical , , had a maximum ccn score in the classical subtype ( . ) . similarly, ludlu- and eplc- h, previously reported as classical and basal respectively, had maximal tumor subtype ccn scores for these sub-types ( . and . ) (fig b, supp tab ) despite classified as unknown. lastly, the luad cell lines that were classified as a subtype were either classified as proximal inflammation or proximal proliferation (fig c). rerf-lc-ad had the highest general classification score and the highest proximal inflammation subtype classification score. taken together, these subtype classification results have revealed an absence of cell lines models for basal and secretory lusc, and for the terminal respiratory unit (tru) luad subtype. cancer cell lines’ popularity and transcriptional fidelity finally, we sought to measure the extent to which cell line transcriptional fidelity related to model prevalence. we used the number of papers in which a model was mentioned, normalized by the number of years since the cell line was documented, as a rough approximation of model prevalence. to explore this relationship, we plotted the normalized citation count versus general classification score, labeling the highest cited and highest .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / classified cell lines from each general tumor type (fig d). for most of the general tumor types, the highest cited cell line is not the highest classified cell line except for hep g , ags and ml- , representing liver hepatocellular carcinoma (lihc), stomach adenocarcinoma (stad), and thyroid carcinoma (thca), respectively. on the other hand, the general scores of the highest cited cell lines representing blca (t ), brca (mda-mb- ), and prad (pc- ) fall below the classification threshold of . . notably, each of these tumor types have other lines with scores exceeding . , which should be considered as more faithful transcriptional models when selecting lines for a study (supp tab and http://www.cahanlab.org/resources/cancercellnet_results/). evaluation of patient derived xenografts next, we sought to evaluate a more recent class of cancer models: pdx. to do so, we subjected the rna-seq expression profiles of pdx models from different types of cancer types generated previously to ccn. similar to the results of ccls, the pdxs exhibited a wide range of classification scores (fig a, supp tab ). by categorizing the ccn scores of pdx based on the proportion of samples associated with each tumor type that were correctly classified, we found that sarc, skcm, coad_read and brca have higher proportion of correctly classified pdx than those of other cancer categories (fig b). in contrast to ccls, we found a higher proportion of correctly classified pdx in stad, paad and kirc (fig b). however, similar to ccls, no esca pdxs were classified as such. this held true when we performed subtype classification on pdx samples: none of the pdx in esca were classified as any of the esca subtypes (supp tab ). ucec pdxs had both endometrioid subtypes, serous subtypes, and mixed subtypes, which provided a broader representation than ccls (fig c). several lusc pdxs that were classified as a subtype were also classified as head and neck squamous cell carcinoma (hnsc) or mix hnsc and lusc (fig d). this could be due to the similarity in expression profiles of basal and classical subtypes of hnsc and lusc , , which is .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / consistent with the observation that these pdxs were also subtyped as classical. no lusc pdxs were classified as the secretory subtype. in contrast to luad ccls, four of the five luad pdxs with a discernible sub-type were classified as proximal inflammatory (fig e). on the other hand, similar to the ccls, there were no tru subtypes in the luad pdx cohort. in summary, we found that while individual pdxs can reach extremely high transcriptional fidelity to both general tumor types and subtypes, many pdxs were not classified as the general tumor type from which they originated. evaluation of gemms next, we used ccn to evaluate gemms of six general tumor types from nine studies for which expression data was publicly available – . as was true for ccls and pdxs, gemms also had a wide range of ccn scores (fig a, supp tab ). we next categorized the ccn scores based on the proportion of samples associated with each tumor type that were correctly classified (fig b). in contrast to lgg ccls, lgg gemms, generated by nf mutations expressed in different neural progenitors in combination with pten deletion , consistently were classified as lgg (fig a-b). the gemm dataset included multiple replicates per model, which allowed us to examine intra-gemm variability. both at the level of ccn score and at the level of categorization, gemms were invariant. for example, replicates of ucec gemms driven by prg(cre/+)pten(lox/lox) received almost identical general ccn scores (fig c, supp tab ). gemms sharing genotypes across studies, such as luad gemms driven by kras mutation and loss of p , , , also received similar general and subtype classification scores (fig a,b,e). next, we explored the extent to which genotype impacted subtype classification in ucec, lusc, and luad. prg(cre/+)pten(lox/lox) gemms had a mixed subtype classification of both serous and endometrioid, consistent with the fact that pten loss occurs in both subtypes (albeit more frequently in endometrioid). we also analyzed prg(cre/+)pten(lox/lox)csf r-/- gemms. polymorphonuclear neutrophils (pmns), which play anti-tumor roles in endometrioid .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cancer progression, are depleted in these animals. interestingly, prg(cre/+)pten(lox/lox)csf r-/- gemms had a serous subtype classification, which could be explained by differences in pmn involvement in endometrioid versus serous uterine tumor development that are reflected in the respective transcriptomes of the tcga ucec training data. we note that the tumor cells were sorted prior to rna-seq and thus the shift in subtype classification is not due to contamination of gemms with non-tumor components. in short, this analysis supports the argument that tumor- cell extrinsic factors, in this case a reduction in anti-tumor pmns, can shift the transcriptome of a gemm so that it more closely resembles a serous rather than endometrioid subtype. the lusc gemms that we analyzed were lkb fl/fl and they either overexpressed of sox (via two distinct mechanisms) or were also ptenfl/fl . we note that the eight lenti-sox - cre-infected;lkb fl/fl and rosa lsl-sox -ires-gfp;lkb fl/fl samples that classified as 'unknown' had lusc ccn scores only modestly lower than the decision threshold (fig d) (mean ccn score = . ). thirteen out of the of the sox gemms classified as the secretory subtype of lusc. the consistency is not surprising given both models overexpress sox and lose lkb . on the other hand, the lkb fl/fl;ptenfl/fl gemms had substantially lower general lusc ccn scores and our subtype classification indicated that this gemm was mostly classified as 'unknown', in contrast to prior reports suggesting that it is most similar to a basal subtype . none of the three lusc gemms have strong classical ccn scores. most of the luad gemms, which were generated using various combinations of activating kras mutation, loss of trp , and loss of smarca l , , , were correctly classified (fig e). those that were not classified have modestly lower ccn score than the decision threshold (mean ccn score = . ) . there were no substantial differences in general or subtype classification across driver genotypes. although the sub-type of all luad gemms was 'unknown', the subtypes tended to have a mixture of high ccn proximal proliferation, proximal inflammation and tru scores. taken together, this analysis suggests that there is a degree of similarity, and perhaps plasticity between the primitive and secretory (but not basal or classical) subtypes of lusc. on the other .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hand, while the luad gemms classify strongly as luad, they do not have strong particular subtype classification -- a result that does not vary by genotype. evaluation of tumoroids lastly, we used ccn to assess a relatively novel cancer model: tumoroids. we downloaded and assessed distinct tumoroid expression profiles spanning cancer categories from the nci patient-derived models repository (pdmr) and from three individual studies – (fig a, supp tab ). we note that several categories have three or fewer samples (brca, cesc, kirp, ov, lihc, and blca from pdmr). among the cancer categories represented by more than three samples, only lusc and paad have fewer than % classified as their annotated label (fig b). in contrast to gbm ccls, all three induced pluripotent stem cell-derived gbm tumoroids were classified as gbm with high ccn scores (mean = . ). to further characterize the tumoroids, we performed subtype classification on them (supp tab ). ucec tumoroids from pdmr contains a wide range of subtypes with two endometrioid, two serous and one mixed type (fig c). on the other hand, lusc tumoroids appear to be predominantly of classical subtypes with one tumoroid classified as a mix between classical and primitive (fig d). lastly, similar to the ccl and pdx counterparts, luad tumoroids are classified as proximal inflammatory and proximal proliferation with no tumoroids classified as tru subtype (fig e). comparison of ccls, pdxs, gemms and tumoroids finally, we sought to estimate the comparative transcriptional fidelity of the four cancer models modalities. we compared the general ccn scores of each model on a per tumor type basis (fig ). in the case of gemms, we used the mean classification score of all samples with shared genotypes. we also used mean classification of technical replicates found in lihc tumoroids . we evaluated models based on both the maximum ccn score, as this represents .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the potential for a model class, and the median ccn score, as this indicates the current overall transcriptional fidelity of a model class. pdxs achieved the highest ccn scores in three (ucec, paad, luad) out of the five cancer categories in which all four modalities were available (fig ), despite having low median ccn scores. notably, pdxs have a median ccn score above the . threshold in paad while none of the other three modalities have any samples above the threshold. in lihc, the highest ccn score for pdx ( . ) is only slightly lower than the highest ccn score for tumoroid ( . ). this suggest that certain individual pdxs most closely mimic the transcriptional state of native patient tumors despite a portion of the pdxs having low ccn scores. similarly, while the majority of the ccls have low ccn scores, several lines achieve high transcriptional fidelity in lusc, luad and lihc (fig ). collectively, gemms and tumoroids had the highest median ccn scores in four of the five model classes (lusc and luad for gemms and ucec and lihc for tumoroids). notably, both of the lihc tumoroids achieved ccn scores on par with patient tumors (fig ). in brief, this analysis indicates that pdxs and ccls are heterogenous in terms of transcriptional fidelity, with a portion of the models highly mimicking native tumors and the majority of the models having low transcriptional fidelity (with the exception of paad for pdxs). on the other hand, gemms and tumoroids displayed a consistently high fidelity across different models. because the ccn score is based on a moderate number of gene features (i.e. , gene pairs consisting of , unique genes) relative to the total number of protein-coding genes in the genome, it is possible that a cancer model with a high ccn score might not have a high global similarity to a naturally occurring tumor. therefore, we also calculated the grn status, a metric of the extent to which tumor-type specific gene regulatory network is established , for all models (supp fig ). we observed high level of correlation between the two similarity metrics, which suggests that although ccn classifies on a selected set of genes, its scores are highly correlated with global assessment of transcriptional similarity. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we also sought to compare model modalities in terms of the diversity of subtypes that they represent (supp fig ). as a reference, we also included in this analysis the overall subtype incidence, as approximated by incidence in tcga. replicates in gemms and tumoroids were averaged into one classification profile. in models of ucec, there is a notable difference in endometroid incidence, and the proportion of models classified as endometroid, with pdx and tumoroids having any representatives (supp fig ). all of the ccl, gemm, and tumoroid models of paad have an unknown subtype classification and no correct general classification. however, the majority of pdxs are subtyped as either a mixture of basal and classical, or classical alone. luad have proximal inflammation and proximal proliferation subtypes modelled by ccls and pdx (supp fig ). likewise, lusc have basal, classical and primitive subtypes modelled by ccls and pdxs, and secretory subtype modelled by gemms exclusively (supp fig ). taken together, these results demonstrate the need to carefully select different model systems to more suitably model certain cancer subtypes. discussion a major goal in the field of cancer biology is to develop models that mimic naturally occurring tumors with enough fidelity to enable therapeutic discoveries. however, methods to measure the extent to which cancer models resemble or diverge from native tumors are lacking. this is especially problematic now because there are many existing models from which to choose, and it has become easier to generate new models. here, we present cancercellnet (ccn), a computational tool that measures the similarity of cancer models to naturally occurring tumor types and subtypes. while the similarity of ccls to patient tumors has already been explored in previous work, our tool introduces the capability to assess the transcriptional fidelity of pdxs, gemms, and tumoroids. because ccn is platform- and species-agnostic, it represents a consistent platform to compare models across modalities including ccls, pdxs, gemms and tumoroids. here, we applied ccn to cancer cell lines, patient derived .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / xenografts, distinct genetically engineered mouse models and tumoroids. several insights emerged from our computational analyses that have implications for the field of cancer biology. first, pdxs have the greatest potential to achieve transcriptional fidelity with three out of five general tumor types for which data from all modalities was available, as indicated by the high scores of individual pdxs. notably pdxs are the only modality with samples classified as paad. at the same time, the median ccn scores of pdxs were lower than that of gemms and tumoroids in the other four tumor types. it is unclear what causes such a wide range of ccn scores within pdxs. we suspect that some pdxs might have undergone selective pressures in the host that distort the progression of genomic alterations away from what is observed in natural tumor . future work to understand this heterogeneity is important so as to yield consistently high fidelity pdxs, and to identify intrinsic and host-specific factors that so powerfully shape the pdx transcriptome. second, in general gemms and tumoroids have higher median ccn scores than those of pdxs and ccls. this is also consistent with that fact that gemms are typically derived by recapitulating well-defined driver mutations of natural tumors, and thus this observation corroborates the importance of genetics in the etiology of cancer . moreover, in contrast to most pdxs, gemms are typically generated in immune replete hosts. therefore, the higher overall fidelity of gemms may also be a result of the influence of a native immune system on gemm tumors . the high median ccn scores of tumoroids can be attributed to several factors including the increased mechanical stimuli and cell-cell interactions that come from d self- organizing cultures , . third, we have found that none of the samples that we evaluated here are transcriptionally adequate models of esca. this may be due to an inherent lability of the esca transcriptome that is often preceded by a metaplasia that has obscured determining its cell type(s) of origin . therefore, this tumor type requires further attention to derive new models. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fourth, we found that in several tumor types, gemms tend to reflect mixtures of subtypes rather than conforming strongly to single subtypes. the reasons for this are not clear but it is possible that in the cases that we examined the histologically defined subtypes have a degree of plasticity that is exacerbated in the murine host environment. lastly, we recognize that many ccls are not classified as their annotated labels. while we have suggested that the lack of immune component is not a major confounder, we suspect that the ccls could undergo genetic divergence due to high number of passages, chemotherapy before biopsy, culture condition and genetic instability – , which could all be factors that drive ccls away from their labelled tumors. currently, there are several limitations to our ccn tool, and caveats to our analyses which indicate areas for future work and improvement. first, ccn is based on transcriptomic data but other molecular readouts of tumor state, such as profiles of the proteome , epigenome , non-coding rna-ome , and genome would be equally, if not more important, to mimic in a model system. therefore, it is possible that some models reflect tumor behavior well, and because this behavior is not well predicted by transcriptome alone, these models have lower ccn scores. to both measure the extent that such situations exist, and to correct for them, we plan in the future to incorporate other omic data into ccn so as to make more accurate and integrated model evaluation possible. as a first step in this direction, we plan to incorporate dna methylation and genomic sequencing data as additional features for our random forest classifier as this data is becoming more readily available for both training and cancer models. we expect that this will allow us to both refine our tumor subtype categories and it will enable more accurate predictions of how models respond to perturbations such as drug treatment. a second limitation is that in the cross-species analysis, ccn implicitly assumes that homologs are functionally equivalent. the extent to which they are not functionally equivalent determines how confounded the ccn results will be. this possibility seems to be of limited .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / consequence based on the high performance of the normal tissue cross-species classifier and based on the fact that gemms have the highest median ccn scores (in addition to tumoroids). a third caveat to our analysis is that there were many fewer distinct gemms and tumoroids than ccls and pdxs. as more transcriptional profiles for gemms and tumoroids emerge, this comparative analysis should be revisited to assess the generality of our results. finally, the tcga training data is made up of rna-seq from bulk tumor samples, which necessarily includes non-tumor cells, whereas the ccls are by definition cell lines of tumor origin. therefore, ccls theoretically could have artificially low ccn scores due to the presence of non-tumor cells in the training data. this problem appears to be limited as we found no correlation between tumor purity and ccn score in the ccle samples. however, this problem is related to the question of intra-tumor heterogeneity. we demonstrated the feasibility of using ccn and single cell rna-seq data to refine the evaluation of cancer cell lines contingent upon availability of scrna-seq training data. as more training single cell rna-seq data accrues, ccn would be able to not only evaluate models on a per cell type basis, but also based on cellular composition. we have made the results of our analyses available online so that researchers can easily explore the performance of selected models or identify the best models for any of the general tumor types and the subtypes presented here. to ensure that ccn is widely available we have developed a free web application, which performs ccn analysis on user- uploaded data and allows for direct comparison of their data to the cancer models evaluated here. we have also made the ccn code freely available under an open source license and as an easily installed r package, and we are actively supporting its further development. included in the web application are instructions for training ccn and reproducing our analysis. the documentation describes how to analyze models and compare the results to the panel of models that we evaluated here, thereby allowing researchers to immediately compare their models to the broader field in a comprehensive and standard fashion. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / online methods training general cancercellnet classifier to generate training data sets, we downloaded , patient tumor rna-seq expression count matrix and their corresponding sample table across different tumor types from tcga using tcgaworkflowdata, tcgabiolinks and summarizedexperiment packages. we used all the patient tumor samples for training the general ccn classifier. we limited training and analysis of rna-seq data to the , genes in common between the tcga dataset and all the query samples (ccls, pdxs, gemms, and tumoroids). to train the top pair random forest classifier, we used a method similar to our previous method . ccn first normalized the training counts matrix by down-sampling the counts to , counts per sample. to significantly reduce the execution time and memory of generating gene pairs for all possible genes, ccn then selected n up-regulated genes, n down-regulated genes and n least differentially expressed genes (ccn training parameter ntopgenes = n) for each of the cancer categories using template matching as the genes to generate top scoring gene pairs. in short, for each tumor type, ccn defined a template vector that labelled the training tumor samples in cancer type of interest as and all other tumor samples as ccn then calculated the pearson correlation coefficient between template vector and gene expressions for all genes. the genes with strong match to template as either upregulated or downregulated had large absolute pearson correlation coefficient. ccn chose the upregulated, downregulated and least differentially expressed genes based on the magnitude of pearson correlation coefficient. after ccn selected the genes for each cancer type, ccn generated gene pairs among those genes. gene pair transformation was a method inspired by the top-scoring pair classifier to allow compatibility of classifier with query expression profiles that were collected through different platforms (e.g. microarray query data applied to rna-seq training data). in brief, the gene pair transformation compares genes within an expression sample and encodes the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / “gene _gene ” gene-pair as if the first gene has higher expression than the second gene. otherwise, gene pair transformation would encode the gene-pair as . using all the gene pair combinations generated through the gene sets per cancer type, ccn then selected top m discriminative gene pairs (ccn training parameter ntopgenepairs = m) for each category using template matching (with large absolute pearson correlation coefficient) described above. to prevent any single gene from dominating the gene pair list, we allowed each gene to appear at maximum of three times among the gene pairs selected as features per cancer type. after the top discriminative gene pairs were selected for each cancer category, ccn grouped all the gene pairs together and gene pair transformed the training samples into a binary matrix with all the discriminative gene pairs as row names and all the training samples as column names. using the binary gene pair matrix, ccn randomly shuffled the binary values across rows then across columns to generate random profiles that should not resemble training data from any of the cancer categories. ccn then sampled random profiles, annotated them as “unknown” and used them as training data for the “unknown” category. using gene pair binary training matrix, ccn constructed a multi-class random forest classifier of trees and used stratified sampling of sample size to ensure balance of training data in constructing the decision trees. to identify the best set of genes and gene-pair parameters (n and m), we used a grid- search cross-validation strategy with cross-validations at each parameter set. the specific parameters for the final ccn classifier using the function “broadclass_train” in the package cancercellnet are in supp tab . the gene-pairs are in supp tab . validating general cancercellnet classifier two thirds of patient tumor data from each cancer type were randomly sampled as training data to construct a ccn classifier. based on the training data, ccn selected the classification genes and gene-pairs and trained a classifier. after the classifier was built, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / held-out samples from each cancer category were sampled and “unknown” profiles were generated for validation. the process of randomly sampling training set from / of all patient tumor data, selecting features based on the training set, training classifier and validating was repeated times to have a more comprehensive assessment of the classifier trained with the optimal parameter set. to test the performance of final ccn on independent testing data, we applied it to profiles from icgc spanning projects that do not overlap with tcga (brca- kr, liri-jp, ov-au, paca-au, paca-ca, prad-fr). selecting decision thresholds our strategy for selecting a decision threshold was to find the value that maximizes the average macro f measure for each of the cross-validations that were performed with the optimal parameter set, testing thresholds between and with a . increment. the f measure is defined as: 𝑀𝑎𝑐𝑟𝑜 𝐹 = × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙 we selected the most commonly occurring threshold above . that maximized the average macro f measure across the cross-validations as the decision threshold for the final classifier (threshold = . ). the same approach was applied for the subtype classifiers. the thresholds and the corresponding average precision, recall and f measures are recorded in (supp tab ). classifying query data into general cancer categories we downloaded the rna-seq cancer cell lines expression profiles and sample table from (https://portals.broadinstitute.org/ccle/data), and microarray cancer cell lines expression profiles and sample table from barretina et al . we extracted two wt control nccit rna-seq expression profiles from grow et al . we received pdx expression estimates and sample .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / annotations from the authors of gao et al . we gathered gemm expression profiles from nine different studies – . we downloaded tumoroid expression profiles from the nci patient- derived models repository (pdmr) and from three individual studies – . to use ccn classifier on gemm data, the mouse genes from gemm expression profiles were converted into their human homologs. the query samples were classified using the final ccn classifier. each query classification profile was labelled as one of the four classification categories: “correct”, “mixed”, “none” and “other” based on classification profiles. if a sample has a ccn score higher than the decision threshold in the labelled cancer category, we assigned that as “correct”. if a sample has ccn score higher than the decision threshold in labelled cancer category and in other cancer categories, we assigned that as “mixed”. if a sample has no ccn score higher than the decision threshold in any cancer category or has the highest ccn score in ‘unknown’ category, then we assigned it as “none”. if a sample has ccn score higher than the decision threshold in a cancer category or categories not including the labelled cancer category, we assigned it as ”other”. we analyzed and visualized the results using r and r packages pheatmap and ggplot . cross-species assessment to assess the performance of cross-species classification, we downloaded labelled human tissue/cell type and labelled mouse tissue/cell type rna-seq expression profiles from github (https://github.com/pcahan /cellnet). we first converted the mouse genes into human homologous genes. then we found the intersecting genes between mouse tissue/cell expression profiles and human tissue/cell expression profiles. limiting the input of human tissue rna-seq profiles to the intersecting genes, we trained a ccn classifier with all the human tissue/cell expression profiles. the parameters used for the function “broadclass_train” in the package cancercellnet are in supp tab . we randomly sampled .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / samples from each tissue category in mouse tissue/cell data and applied the classifier on those samples to assess performance. cross-technology assessment to assess the performance of ccn in applications to microarray data, we gathered , patient tumor microarray profiles across different cancer types from more than different projects (supp tab ). we found the intersecting genes between the microarray profiles and tcga patient rna-seq profiles. limiting the input of rna-seq profiles to the intersecting genes, we created a ccn classifier with all the tcga patient profiles using parameters for the function “broadclass_train” listed in supp tab . after the microarray specific classifier was trained, we randomly sampled microarray patient samples from each cancer category and applied ccn classifier on them as assessment of the cross-technology performance in supp fig a. the same ccn classifier was used to assess microarray ccl samples supp fig b. training and validating scrna-seq classifier we extracted labelled human melanoma and glioblastoma scrna-seq expression profiles , , and compiled the two datasets excluding cell types t.cd , t.cd and myeloid due to low number of cells for training. cells from each of the cell types were sampled for training a scrna-seq classifier. the parameters for training a general scrna-seq classifier using the function “broadclass_train” are in supp tab . cells from each of the cell types from the held-out data were selected to assess the single cell classifier. using maximization of average macro f measure, we selected the decision threshold of . . the gene-pairs that were selected to construct the classifier are in supp tab . to assess the cross-technology capability of applying scrna-seq classifier to bulk rna-seq, we downloaded expression .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / profiles spanning purified cell types (b cells, endothelial cells, monocyte/macrophage, fibroblast) from https://github.com/pcahan /cellnet. training subtype cancercellnet we found cancer types (brca, coad, esca, hnsc, kirc, lgg, paad, ucec, stad, luad, lusc) which have meaningful subtypes based on either histology or molecular profile and have sufficient samples to train a subtype classifier with high aupr. we also included normal tissues samples from brca, coad, hnsc, kirc, ucec to create a normal tissue category in the construction of their subtype classifiers. training samples were either labelled as a cancer subtype for the cancer of interest or as “unknown” if they belong to other cancer types. similar to general classifier training, ccn performed gene pair transformation and selected the most discriminate gene pairs for each cancer subtype. in addition to the gene pairs selected to discriminate cancer subtypes, ccn also performed general classification of all training data and appended the classification profiles of training data with gene pair binary matrix as additional features. the reason behind using general classification profile as additional features is that many general cancer types may share similar subtypes, and general classification profile could be important features to discriminate the general cancer type of interest from other cancer types before performing finer subtype classification. the specific parameters used to train individual subtype classifiers using “subclass_train” function of cancercellnet package can be found in supp tab and the gene pairs are in supp tab . validating subtype cancercellnet similar to validating general class classifier, we randomly sampled / of all samples in each cancer subtype as training data and sampled an equal amount across subtypes in the / held-out data for assessing subtype classifiers. we repeated the process times for more comprehensive assessment of subtype classifiers. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / classifying query data into subtypes we assigned subtype to query sample if the query sample has ccn score higher than the decision threshold. the table of decision threshold for subtype classifiers are in supp tab . if no ccn scores exceed the decision threshold in any subtype or if the highest ccn score is in ‘unknown’ category, then we assigned that sample as ‘unknown’. analysis was performed in r and visualizations were generated with the complexheatmap package . cells culture, immunohistochemistry and histomorphometry caov- (atcc® htb- ™), sk-ov- (atcc® htb- ™), rt (atcc® htb- ™), and nccit(atcc® crl- ™) cell lines were purchased from atcc. hec- (c ) and a ( - vl) were obtained from addexbio technologies and sigma-aldrich. vcap and pc- . sk-ov- , vcap, and rt were cultured in dulbecco's modified eagle medium (dmem, high glucose, , gibco) with % penicillin-streptomycin-glutamine ( , life technologies); caov- , pc- , nccit, and a were cultured using rpmi- medium ( , gibco) while hec- was in iscove's modified dulbecco's medium (imdm, , gibco). both media were supplemented with % penicillin-streptomycin ( , gibco). all medium included % fetal bovine serum (fbs). cells cultured in -well plate were washed twice with pbs and fixed in % buffered formalin for hrs at °c. immunostaining was performed using a standard protocol. cells were incubated with primary antibodies to goat hoxb ( µg/ml, pa - , invitrogen), mouse wt ( µg/ml, ma - , invitrogen), rabbit pparg ( : , abn , millipore), mouse folh ( µg/ml, um , origene), and rabbit lin a ( : , # , cell signaling) in antibody diluent (s - , dako), at °c overnight followed with three min washes in tbst. the slides were then incubated with secondary antibodies conjugated with fluorescence at room temperature for h while avoiding light followed with three min washes in tbst and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / nuclear stained with mounting medium containing dapi. images were captured by nikon eclipse ti-s, ds-u and ds-qi . histomorphometry was performed using imagej (version . . -rc- / . i). % n.positive cells was calculated by the percentage of the number of positive stained cells divided by the number of dapi-positive nucleus within three of randomly chosen areas. the data were expressed as means ± sd. tumor purity analysis we used the r package estimate to calculate the estimate scores from tcga tumor expression profiles that we used as training data for ccn classifier. to calculate tumor purity we used the equation described in yoshihara et al., : tumour purity = cos ( . + . × estimate score) extracting citation counts we used the r package rismed to extract the number of citations for each cell line through query search of “cell line name[text word] and cancer[text word]” on pubmed. the citation counts were normalized by dividing the citation counts with the number of years since first documented. 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡𝑠 = 𝑐𝑖𝑡𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑢𝑛𝑡𝑠 # 𝑦𝑒𝑎𝑟𝑠 𝑠𝑖𝑛𝑐𝑒 𝑓𝑖𝑟𝑠𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑒𝑑 grn construction and grn status grn construction was extended from our previous method . samples per cancer type were randomly sampled and normalized through down sampling as training data for the clr grn construction algorithm. cancer type specific grns were identified by determining the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / differentially expressed genes per each cancer type and extracting the subnetwork using those genes. to extend the original grn status algorithm across different platforms and species, we devised a rank-based grn status algorithm. like the original grn status, rank based grn status is a metric of assessing the similarity of cancer type specific grn between training data in the cancer type of interest and query samples. hence, high grn status represents high level of establishment or similarity of the cancer specific grn in the query sample compared to those of the training data. the expression profiles of training data and query data were transformed into rank expression profiles by replacing the expression values with the rank of the expression values within a sample (highest expressed gene would have the highest rank and lowest expressed genes would have a rank of ). cancer type specific mean and standard deviation of every gene’s rank expression were learned from training data. the modified z-score values for genes within cancer type specific grn were calculated for query sample’s rank expression profiles to quantify how dissimilar the expression values of genes in query sample’s cancer type specific grn compared to those of the reference training data: 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)xyz = [ , 𝑖𝑓 𝑍𝑠𝑐𝑜𝑟𝑒 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒 𝑖𝑠 𝑓𝑜𝑢𝑛𝑑 𝑡𝑜 𝑏𝑒 𝑢𝑝𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑒𝑑 , 𝑖𝑓 𝑍𝑠𝑐𝑜𝑟𝑒 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑔𝑒𝑛𝑒 𝑖𝑠 𝑓𝑜𝑢𝑛𝑑 𝑡𝑜 𝑏𝑒 𝑑𝑜𝑤𝑛𝑟𝑒𝑔𝑢𝑙𝑎𝑡𝑒𝑑 𝑎𝑏𝑠(𝑍𝑠𝑐𝑜𝑟𝑒), 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 if a gene in the cancer type specific grn is found to be upregulated in the specific cancer type relative to other cancer types, then we would consider query sample’s gene to be similar if the ranking of the query sample’s gene is equal to or greater than the mean ranking of the gene in training sample. as a result of similarity, we assign that gene of a z-score of . the same principle applies to cases where the gene is downregulated in cancer specific subnetwork. grn status for query sample is calculated as the weighted mean of the ( − 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)xyz) across genes in cancer type specific grn. is an arbitrary .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / large number, and larger dissimilarity between query’s cancer type specific grn indicate high z-scores for the grn genes and low grn status. 𝑅𝐺𝑆 = e( − 𝑍𝑠𝑐𝑜𝑟𝑒(𝑔𝑒𝑛𝑒 𝑖)xyz)𝑤𝑒𝑖𝑔ℎ𝑡fghg i h ijk 𝐺𝑅𝑁 𝑆𝑡𝑎𝑡𝑢𝑠 = 𝑅𝐺𝑆 ∑ 𝑤𝑒𝑖𝑔ℎ𝑡fghg ihijk the weight of individual genes in the cancer specific network is determined by the importance of the gene in the random forest classifier. finally, the grn status gets normalized with respect to the grn status of the cancer type of interest and the cancer type with the lowest mean grn status. 𝑁𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑 𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 = 𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 mngop − 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 xih qrhqgo) 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠qrhqgo sptg ihsgogus) where “min cancer” represents the cancer type where its training data have the lowest mean grn status in the cancer type of interest, and 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠 xih qrhqgo) represents the lowest average grn status in the cancer type of interest. 𝑎𝑣𝑔(𝐺𝑅𝑁 𝑠𝑡𝑎𝑡𝑢𝑠qrhqgo sptg ihsgogus) represents average grn status of the cancer type of interest in the training data. code availability cancercellnet code and documentation is available at github: https://github.com/pcahan /cancercellnet acknowledgements this work was supported by the national institutes of health nci ovarian cancer spore p ca via a development research program award to pc. fwh was supported by a prostate cancer foundation young investigator award, department of defense w xwh- - pcrp-hd (f.w.h.), the national institutes of health/national cancer institute p ca - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (f.w.h.) u ca (f.w.h.). we would like to thank john powers, hao zhu, tian-li wang, charles eberhart, and kaloyan tsanov for comments on the manuscript and helpful discussions. some figures were created in part with biorender.com. figure legends fig. cancercellnet (ccn) workflow, training, and performance. (a) schematic of ccn usage. ccn was designed to assess and compare the expression profiles of cancer models such as ccls, pdxs, gemms, and tumoroids with native patient tumors. to use trained classifier, ccn inputs the query samples (e.g. expression profiles from ccls, pdxs, gemms, tumoroids) and generates a classification profile for the query samples. the column names of the classification heatmap represent sample annotation and the row names of the classification heatmap represent different cancer types. each grid is colored from black to yellow representing the lowest classification score (e.g. ) to highest classification score (e.g. ). (b) schematic of ccn training process. ccn uses patient tumor expression profiles of different cancer types from tcga as training data. first, ccn identifies n genes that are upregulated, n that are downregulated, and n that are relatively invariant in each tumor type versus all of the others. then, ccn performs a pair transform on these genes and subsequently selects the most discriminative set of m gene pairs for each cancer type as features (or predictors) for the random forest classifier. lastly, ccn trains a multi-class random forest classifier using gene- pair transformed training data. (c) parameter optimization strategy. cross-validations of each parameter set in which / of tcga data was used to train and / to validate was used search for the values of n and m that maximized performance of the classifier as measured by area under the precision recall curve (auprc). (d) mean and standard deviation of classifiers based on cross-validations with the optimal parameter set. (e) auprc of the final ccn classifier when applied to independent patient tumor data from icgc. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. evaluation of cancer cell lines. (a) general classification heatmap of ccls extracted from ccle. column annotations of the heatmap represent the labelled cancer category of the ccls given by ccle and the row names of the heatmap represent different cancer categories. ccls’ general classification profiles are categorized into categories: correct (red), correct mixed (pink), no classification (light green) and other classification (dark green) based on the decision threshold of . . (b) bar plot represents the proportion of each classification category in ccls across cancer types ordered from the cancer types with the highest proportion of correct and correct mixed ccls to lowest proportion. (c) comparison between skcm general ccn scores from bulk rna-seq classifier and skcm malignant ccn scores from scrna-seq classifier for skcm ccls. (d) comparison between sarc general ccn scores from bulk rna- seq classifier and caf ccn scores from scrna-seq classifier for skcm ccls. (e) comparison between gbm general ccn scores from bulk rna-seq classifier and gbm neoplastic ccn scores from scrna-seq classifier for gbm ccls. (f) comparison between sarc general ccn scores and caf ccn scores from scrna-seq classifier for gbm ccls. the green lines indicate the decision threshold for scrna-seq classifier and general classifier. fig. immunofluorescence of selected cell lines. (a) classification profiles (left) and if expression (middle) of caov- (ov positive control), hec- (ucec positive control) and sk- ov- for wt (ov biomarker) and hoxb (uterine biomarker). the bar plots quantify the average percentage of positive cells for wt (top-right) and hoxb (bottom-right). (b) classification profiles (left) and if expression (middle) of caov- , nccit (germ cell tumor positive control) and a for wt and lin a (germ cell tumor biomarker). classification of nccit were performed using rna-seq profiles of wt control nccit duplicate from grow et al . the bar plots quantify the average percentage of positive cells for wt (top-right) and lin a (bottom-right). (c) classification profiles (left) and if expression (middle) of vcap (prad positive control), rt (blca positive control) and pc- for folh (prostate biomarker) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and pparg (urothelial biomarker). the bar plots quantify the average percentage of positive cells for folh (top-right) and pparg (bottom-right). fig. subtype classification of ccls and ccl prevalence. the heatmap visualizations represent subtype classification of (a) ucec ccls, (b) lusc ccls and (c) luad ccls. only samples with ccn scores > . in their nominal tumor type are displayed. (d) comparison of normalized citation counts and general ccn classification scores of ccls. labelled cell lines either have the highest ccn classification score in their labelled cancer category or highest normalized citation count. each citation count was normalized by number of years since first documented on pubmed. fig. evaluation of patient derived xenografts. (a) general classification heatmap of pdxs. column annotations represent annotated cancer type of the pdxs, and row names represent cancer categories. (b) proportion of classification categories in pdxs across cancer types is visualized in the bar plot and ordered from the cancer type with highest proportion of correct and mixed correct classified pdxs to the lowest. subtype classification heatmaps of (c) ucec pdxs, (d) lusc pdxs and (e) luad pdxs. only samples with ccn scores > . in their nominal tumor type are displayed. fig. evaluation of genetically engineered mouse models. (a) general classification heatmap of gemms. column annotations represent annotated cancer type of the gemms, and row names represent cancer categories. (b) proportion of classification categories in gemms across cancer types is visualized in the bar plot and ordered from the cancer type with highest proportion of correct and mixed correct classified gemms to the lowest. subtype classification heatmap of (c) ucec gemms, (d) lusc gemms and (e) luad gemms. only samples with ccn scores > . in their nominal tumor type are displayed. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. evaluation of tumoroid models. (a) general classification heatmap of tumoroids. column annotations represent annotated cancer type of the tumoroids, and row names represent cancer categories. (b) proportion of classification categories in tumoroids across cancer types is visualized in the bar plot and ordered from the cancer type with highest proportion of correct and mixed correct classified tumoroids to the lowest. subtype classification heatmap of (c) ucec tumoroids, (d) lusc tumoroids and (e) luad tumoroids. only samples with ccn scores > . in their nominal tumor type are displayed. fig. comparison of ccls, pdxs, and gemms. box-and-whiskers plot comparing general ccn scores across ccls, gemms, pdxs of five general tumor types (ucec, paad, lusc, luad, lihc). supplementary information supplementary figure assessment of ccn general classifier and subtype classifier. (a) mean auprc of repeated grid-search cross-validation for each parameter grid. (b) mean and range of ccn classifier’s pr curves from cross validations based on the optimal feature selection parameters n and m. (c) auprc of ccn human tissue classifier when applied to mouse tissue data. (d) the schematic of training a subtype classifier in ccn. ccn uses patient tumor expression profiles from cancer of interest as training data. ccn performs gene-pair transformation and selects the most discriminative gene pairs among the cancer subtypes from training data as features. ccn then applies the general classification on training data and uses the general classification profile as features in addition to gene pairs for training a random forest classifier. the weight of the general classification profiles as features can be tuned to improve auprc. (e) the mean and standard deviation of auprc for subtype classifiers based on iterations of random sampling of training and held-out data, training subtype .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / classifier using training data, classification of held-out data, and calculation of recall and precision. supplementary figure further validation of ccn and classification results. to validate the cross-platform classification performance of ccn, a new classifier specifically trained to classify microarray data was trained using rna-seq data from tcga as training data and intersecting genes between rna-seq data and microarray data. (a) auprc of ccn classifier when applied to tumor profiles assayed on microarrays. (b) classification heatmap of ccls using microarray expression data. (c) pearson correlation between ccn scores of ccle lines generated from rna-seq data and microarray data. (d) comparison between ccls’ ccn scores and the similarity metric from yu et al , median correlations of transcriptional profiles between ccls and tcga tumors from ccls’ labelled cancer category. (e) comparison of mean tumor purity of training data and mean ccn scores of ccls for each cancer category. supplementary figure single-cell classification of skcm and gbm cell lines. (a) auprc of the single-cell classifier when applied to scrna-seq held-out data. (b) auprc of the scrna- seq classifier when applied to purified bulk rna samples. (c) single-cell classification of skcm ccls. red bar-plot (top) represents general ccn scores in sarc and blue bar-plot (bottom) represents general ccn scores in skcm. (d) single-cell classification of gbm ccls. red bar- plot (top) represents general ccn scores in sarc and yellow bar-plot (bottom) represents general ccn scores in gbm. supplementary figure correlation between cancer type specific network grn status and general ccn scores. supplementary figure proportion of cancer subtypes in different cancer models and tcga tumor data across general cancer types. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary table general classification profiles of ccls. supplementary table subtype classification profiles of ccls. supplementary table general classification profiles of pdxs. supplementary table subtype classification profiles of pdxs. supplementary table general classification profiles of gemms supplementary table subtype classification profiles of gemms. supplementary table general classification profiles of tumoroids. supplementary table subtype classification profiles of tumoroids. supplementary table specific parameters used for training of all classifiers. supplementary table gene-pairs selected for final training of ccn general, subtype classifiers and single-cell classifier. supplementary table decision thresholds and the corresponding precision and recall for the general classifier and subtype classifier. supplementary table accessions of tumor microarray data used in validation. references . sharma, s. v., haber, d. a. & settleman, j. cell line-based platforms to evaluate the therapeutic efficacy of candidate anticancer agents. nat. rev. cancer , – ( ). . kersten, k., de visser, k. e., van miltenburg, m. h. & jonkers, j. genetically engineered mouse models in oncology research and cancer medicine. embo mol. med. , – ( ). . hidalgo, m. et al. patient-derived xenograft models: an emerging platform for translational cancer research. cancer discov. , – ( ). . drost, j. & clevers, h. organoids in cancer research. nat. rev. cancer , – ( ). . klijn, c. et al. a comprehensive transcriptional portrait of human cancer cell lines. nat. biotechnol. , – ( ). . koren, s. et al. pik ca(h r) induces multipotency and multi-lineage mammary tumours. nature , – ( ). . derose, y. s. et al. tumor grafts derived from women with breast cancer authentically reflect tumor pathology, growth, metastasis and disease outcomes. nat. med. , – ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . sharpless, n. e. & depinho, r. a. the mighty mouse: genetically engineered mouse models in cancer drug development. nat. rev. drug discov. , – ( ). . mouradov, d. et al. colorectal cancer cell lines are representative models of the main molecular subtypes of primary cancer. cancer res. , – ( ). . stuckelberger, s. & drapkin, r. precious gemms: emergence of faithful models for ovarian cancer research. j. pathol. , – ( ). . domcke, s., sinha, r., levine, d. a., sander, c. & schultz, n. evaluating cell lines as tumour models by comparison of genomic profiles. nat. commun. , ( ). . jiang, g. et al. comprehensive comparison of molecular portraits between cell lines and tumors in breast cancer. bmc genomics suppl , ( ). . chen, b., sirota, m., fan-minogue, h., hadley, d. & butte, a. j. relating hepatocellular carcinoma tumor samples and cell lines using gene expression data in translational research. bmc med. genomics suppl , s ( ). . vincent, k. m., findlay, s. d. & postovit, l. m. assessing breast cancer cell lines as tumour models by comparison of mrna expression profiles. breast cancer res. , ( ). . yu, k. et al. comprehensive transcriptomic analysis of cell lines as models of primary tumors across tumor types. nat. commun. , ( ). . najgebauer, h. et al. cellector: genomics-guided selection of cancer in vitro models. cell syst. , – .e ( ). . salvadores, m., fuster-tormo, f. & supek, f. matching cell lines with cancer type and subtype of origin via mutational, epigenomic, and transcriptomic patterns. sci. adv. , ( ). . guernet, a. & grumolato, l. crispr/cas editing of the genome for cancer modeling. methods - , – ( ). . gargiulo, g. next-generation in vivo modeling of human cancers. front. oncol. , ( ). . gao, h. et al. high-throughput screening using patient-derived tumor xenografts to predict clinical trial drug response. nat. med. , – ( ). . cahan, p. et al. cellnet: network biology applied to stem cell engineering. cell , – ( ). . radley, a. h. et al. assessment of engineered cells using cellnet and rna-seq. nat. protoc. , – ( ). . tan, y. & cahan, p. singlecellnet: a computational tool to classify single cell rna-seq data across platforms and across species. cell syst. , – .e ( ). . cancer genome atlas network. comprehensive molecular characterization of human colon and rectal cancer. nature , – ( ). . zhang, j. et al. international cancer genome consortium data portal--a one-stop shop for cancer genomics data. database (oxford) , bar ( ). . cancer genome atlas network. comprehensive molecular portraits of human breast tumours. nature , – ( ). . parker, j. s. et al. supervised risk predictor of breast cancer based on intrinsic subtypes. j. clin. oncol. , – ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . wilkerson, m. d. et al. lung squamous cell carcinoma mrna expression subtypes are reproducible, clinically important, and correspond to normal cell types. clin. cancer res. , – ( ). . cancer genome atlas research network. electronic address: andrew_aguirre@dfci.harvard.edu & cancer genome atlas research network. integrated genomic characterization of pancreatic ductal adenocarcinoma. cancer cell , – .e ( ). . cancer genome atlas research network et al. integrated genomic characterization of endometrial carcinoma. nature , – ( ). . cancer genome atlas research network et al. integrated genomic characterization of oesophageal carcinoma. nature , – ( ). . cancer genome atlas network. comprehensive genomic characterization of head and neck squamous cell carcinomas. nature , – ( ). . cancer genome atlas research network. comprehensive molecular characterization of clear cell renal cell carcinoma. nature , – ( ). . verhaak, r. g. w. et al. integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in pdgfra, idh , egfr, and nf . cancer cell , – ( ). . cancer genome atlas research network. comprehensive molecular profiling of lung adenocarcinoma. nature , – ( ). . hu, b. et al. gastric cancer: classification, histology and application of molecular pathology. j. gastrointest. oncol. , – ( ). . barretina, j. et al. the cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. nature , – ( ). . medico, e. et al. the molecular landscape of colorectal cancer cell lines unveils clinically actionable kinase targets. nat. commun. , ( ). . park, j.-g. et al. characteristics of cell lines established from human colorectal carcinoma. cancer res. ( ). . jerby-arnon, l. et al. a cancer cell program promotes t cell exclusion and resistance to checkpoint blockade. cell , – .e ( ). . darmanis, s. et al. single-cell rna-seq analysis of infiltrating neoplastic cells at the migrating front of human glioblastoma. cell rep. , – ( ). . patel, a. p. et al. single-cell rna-seq highlights intratumoral heterogeneity in primary glioblastoma. science , – ( ). . xu, b. et al. regulation of endometrial receptivity by the highly expressed hoxa , hoxa and hoxd hox-class homeobox genes. hum. reprod. , – ( ). . raines, a. m. et al. recombineering-based dissection of flanking and paralogous hox gene functions in mouse reproductive tracts. development , – ( ). . netinatsunthorn, w., hanprasertpong, j., dechsukhum, c., leetanaporn, r. & geater, a. wt gene expression as a prognostic marker in advanced serous epithelial ovarian carcinoma: an immunohistochemical study. bmc cancer , ( ). . kelly, z. et al. the prognostic significance of specific hox gene expression patterns in ovarian cancer. int. j. cancer , – ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . cancer genome atlas research network. integrated genomic analyses of ovarian carcinoma. nature , – ( ). . wiegand, k. c. et al. arid a mutations in endometriosis-associated ovarian carcinomas. n. engl. j. med. , – ( ). . murray, m. j. et al. lin expression in malignant germ cell tumors downregulates let- and increases oncogene levels. cancer res. , – ( ). . biton, a. et al. independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. cell rep. , – ( ). . fair, w. r., israeli, r. s. & heston, w. d. prostate-specific membrane antigen. prostate , – ( ). . black, j. d., english, d. p., roque, d. m. & santin, a. d. targeted therapy in uterine serous carcinoma: an aggressive variant of endometrial cancer. womens health (lond. engl.) , – ( ). . yang, s., thiel, k. w. & leslie, k. k. progesterone: the ultimate endometrial tumor suppressor. trends endocrinol. metab. , – ( ). . huszar, m. et al. up-regulation of l cam is linked to loss of hormone receptors and e-cadherin in aggressive subtypes of endometrial carcinomas. j. pathol. , – ( ). . kozak, j., wdowiak, p., maciejewski, r. & torres, a. a guide for endometrial cancer cell lines functional assays using the measurements of electronic impedance. cytotechnology , – ( ). . korch, c. et al. dna profiling analysis of endometrial and ovarian cell lines reveals misidentification, redundancy and contamination. gynecol. oncol. , – ( ). . wu, d. et al. gene-expression data integration to squamous cell lung cancer subtypes reveals drug sensitivity. br. j. cancer , – ( ). . walter, v. et al. molecular subtypes in head and neck cancer exhibit distinct patterns of chromosomal gain and loss of canonical cancer genes. plos one , e ( ). . adeegbe, d. o. et al. bet bromodomain inhibition cooperates with pd- blockade to facilitate antitumor response in kras-mutant non-small cell lung cancer. cancer immunol res , – ( ). . blaisdell, a. et al. neutrophils oppose uterine epithelial carcinogenesis via debridement of hypoxic tumor cells. cancer cell , – ( ). . fitamant, j. et al. yap inhibition restores hepatocyte differentiation in advanced hcc, leading to tumor regression. cell rep. , – ( ). . jia, d. et al. crebbp loss drives small cell lung cancer and increases sensitivity to hdac inhibition. cancer discov. , – ( ). . kress, t. r. et al. identification of myc-dependent transcriptional programs in oncogene-addicted liver tumors. cancer res. , – ( ). . li, l. et al. gkap acts as a genetic modulator of nmdar signaling to govern invasive tumor growth. cancer cell , – .e ( ). . mollaoglu, g. et al. the lineage-defining transcription factors sox and nkx - determine lung cancer cell fate and shape the tumor immune microenvironment. immunity , – .e ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . pan, y. et al. whole tumor rna-sequencing and deconvolution reveal a clinically- prognostic pten/pi k-regulated glioma transcriptional signature. oncotarget , – ( ). . lissanu deribe, y. et al. mutations in the swi/snf complex induce a targetable dependence on oxidative phosphorylation in lung cancer. nat. med. , – ( ). . xu, c. et al. loss of lkb and pten leads to lung squamous cell carcinoma with elevated pd-l expression. cancer cell , – ( ). . nci-frederick, frederick, md. national laboratory for cancer research. the nci patient-derived models repository (pdmr). ( ). at . broutier, l. et al. human primary liver cancer-derived organoid cultures for disease modeling and drug screening. nat. med. , – ( ). . lee, s. h. et al. tumor evolution and drug response in patient-derived organoid models of bladder cancer. cell , – .e ( ). . ogawa, j., pao, g. m., shokhirev, m. n. & verma, i. m. glioblastoma model using human cerebral organoids. cell rep. , – ( ). . ben-david, u. et al. patient-derived xenografts undergo mouse-specific tumor evolution. nat. genet. , – ( ). . stratton, m. r., campbell, p. j. & futreal, p. a. the cancer genome. nature , – ( ). . balkwill, f. r., capasso, m. & hagemann, t. the tumor microenvironment at a glance. j. cell sci. , – ( ). . lancaster, m. a. & knoblich, j. a. organogenesis in a dish: modeling development and disease using organoid technologies. science , ( ). . bregenzer, m. e. et al. integrated cancer tissue engineering models for precision medicine. plos one , e ( ). . wang, d. h. & souza, r. f. biology of barrett’s esophagus and esophageal adenocarcinoma. gastrointest endosc clin n am , – ( ). . lee, j. et al. tumor stem cells derived from glioblastomas cultured in bfgf and egf more closely mirror the phenotype and genotype of primary tumors than do serum-cultured cell lines. cancer cell , – ( ). . wenger, s. l. et al. comparison of established cell lines at different passages by karyotype and comparative genomic hybridization. biosci. rep. , – ( ). . ben-david, u. et al. genetic and transcriptional evolution alters cancer cell line drug response. nature , – ( ). . cooke, s. l. et al. genomic analysis of genetic heterogeneity and evolution in high- grade serous ovarian carcinoma. oncogene , – ( ). . hristova, v. a. & chan, d. w. cancer biomarker discovery and translation: proteomics and beyond. expert rev proteomics , – ( ). . dawson, m. a. & kouzarides, t. cancer epigenetics: from mechanism to therapy. cell , – ( ). . silva, t. c. et al. tcga workflow: analyze cancer genomics and epigenomics data using bioconductor packages. [version ; peer review: approved, approved with reservations]. f res. , ( ). . morgan, m., obenchain, v., hester, j. & pag`es, h. summarizedexperiment: summarizedexperiment container. ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . pavlidis, p. & noble, w. s. analysis of strain and regional variation in gene expression in mouse brain. genome biol. , research ( ). . geman, d., d avignon, c., naiman, d. q. & winslow, r. l. classifying gene expression profiles from pairwise mrna comparisons. stat appl genet mol biol , article ( ). . krstajic, d., buturovic, l. j., leahy, d. e. & thomas, s. cross-validation pitfalls when selecting and assessing regression and classification models. j. cheminform. , ( ). . lipton, z. c., elkan, c. & naryanaswamy, b. optimal thresholding of classifiers to maximize f measure. mach. learn. knowl. discov. databases , – ( ). . grow, e. j. et al. intrinsic retroviral reactivation in human preimplantation embryos and pluripotent cells. nature , – ( ). . kolde, r. pheatmap: pretty heatmaps. (cran, ). . wickham, h. ggplot - elegant graphics for data analysis . (springer-verlag new york, ). doi: . / - - - - . gu, z., eils, r. & schlesner, m. complex heatmaps reveal patterns and correlations in multidimensional genomic data. bioinformatics , – ( ). . yoshihara, k. et al. inferring tumour purity and stromal and immune cell admixture from expression data. nat. commun. , ( ). . kovalchik, s. rismed: download content from ncbi databases. (cran.r-project, ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / a b figure highlow c an ce r t yp es cancer models classification score cancer cell lines (ccl) patient derived xenograft (pdx) genetically engineered mouse model (gemm) tumoroids select parameter set with maximum mean auprc. train on all tcga data cancercellnet set parameters n, m randomly select / tcga data; run training process assess performance on / held out data repeat steps ( - ) times ( ) ( ) ( ) ( ) repeat steps ( - ) for each parameter set ( ) cancercellnet rna-seq from … g en e pa irs training data training process train random forest classifier g en es samples g en es labeled rna-seq data select n genes gene pair transform select m gene pairs g en e pa irs g en es samples samples samples samples samples cancercellnet c d e .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure a f c d e ccn score b .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ccn score a b c figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / d a b figure c general classification general ccn score (ucec) sub-type classification endometrioid serous normal unknown general classification general ccn score (lusc) sub-type classification basal classical primitive secretory unknown general classification general ccn score (luad) sub-type classification prox.-inflam prox.-prolif tru unknown .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ccn score figure a b c d e general classification general ccn score (ucec) sub-type classification endometrioid serous normal unknown general classification general ccn score (lusc) sub-type classification basal classical primitive secretory unknown general classification general ccn score (luad) sub-type classification prox.-inflam prox.-prolif tru unknown .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure c ba d e general classification general ccn score (ucec) sub-type classification genotype endometrioid serous normal unknown general classification general ccn score (lusc) sub-type classification genotype basal classical primitive secretory unknown general classification general ccn score (luad) sub-type classification genotype prox.-inflam prox.-prolif tru unknown ccn score .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure a b c d e general classification general ccn score (ucec) sub-type classification endometrioid serous normal unknown general classification general ccn score (lusc) sub-type classification basal classical primitive secretory unknown general classification general ccn score (luad) sub-type classification prox.-inflam prox.-prolif tru unknown ccn score .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental figure ba d e training data samples g en es rna-seq tcga training process gene pair transform feature selection train random forest classifier g en es g en e p ai rs cancercellnetbroad class classification add on to gene pairs as additional features c c n s co re s g en e p ai rs c .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental figure a b d e c .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental figure c d a b .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / semi-supervised calibration of risk with noisy event times (scornet) using electronic health record data semi-supervised calibration of risk with noisy event times (scornet) using electronic health record data yuri ahuja, liang liang, selena huang, tianxi cai january , abstract leveraging large-scaleelectronichealthrecord (ehr)data toestimatesurvival curves forclinical events canenablemorepowerfulriskestimationandcomparativeeffectivenessresearch.however,useofehrdata is hindered by a lack of direct event times observations. occurrence times of relevant diagnostic codes or target disease mentions in clinical notes are at best a good approximation of the true disease onset time. on the other hand, extracting precise information on the exact event time requires laborious manual chart reviewand is sometimesaltogether infeasibleduetoa lackofdetaileddocumentation.currentstatus labels – binary indicators of phenotype status during follow up – are significantly more efficient and feasible to compile, enablingmoreprecise survival curveestimationgiven limitedresources.existingsurvivalanalysis methodsusingcurrentstatus labels focusalmostentirelyonsupervisedestimation,andnaiveincorporation of unlabeled data into these methods may lead to biased results. in this paper we propose semi-supervised calibrationofriskwithnoisyeventtimes(scornet),whichyieldsaconsistentandefficientsurvivalcurve estimator by leveraging a small size of current status labels and a large size of imperfect surrogate features. in addition to providing theoretical justification of scornet, we demonstrate in both simulation and real- worldehrsettingsthatscornetachievesefficiencyakintotheparametricweibullregressionmodel,while alsoexhibitingnon-parametricflexibilityandrelativelylowempiricalbiasinavarietyofgenerativesettings. .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / introduction the electronic health record (ehr) has in recent years become an increasingly available source of data for clinicalandtranslational research(kohaneandothers, ;hripcsakandalbers, ;miottoandothers, ). comprisingheterogeneousclinicalencounters includingdiagnosticandproceduralbillingcodes, labtests,pre- scriptions, and free text clinical notes for millions of patients, these rich data offer abundant opportunities for insilicoepidemiologicalanalysis.oneapplicationthathasgarneredrecent interest isestimationofpopulation disease risk within ehr patient cohorts, which can enable more powerful and precise estimation of real-world disease risks as well as comparative effectiveness analysis of alternative treatment strategies (hodgkins and others, ; dean and others, ; liu and others, ; panahiazar and others, ; steele and others, ). several studies have had success estimating time to death within rule-defined disease cohorts (panahiazar and others, ; steele and others, ). however, estimating the temporal risk of developing a disease is a more challenging task due to ehr’s lack of direct observations of either disease status or the timing of disease on- set.convenientproxiesofdiseasestatusoronset timebasedonreadilyavailable featuressuchas international classification of disease (icd) codes often exhibit low specificity and systematic temporal biases, potentially yielding highly biased disease risk estimators if used as event time labels (cipparone and others, ; uno and others, ). on the other hand, extracting precise information on disease outcomes requires labor-intensive manualchartreview,whichisparticularlychallengingforeventtimessincetheeventmayoccuroutsideof the hospital system and only be mentioned during follow-up visits. it is thus only practically feasible to annotate the current status Δ = �() ≤ �) of the event time), where � is the follow up time. in this paper, we consider the problem of estimating the disease risk �(c) = %() ≤ c) when only a small numberoflabelsonΔ andalargequantityofunlabeledehrfeaturesw, includingproxiesof),areavailable.su- pervisedsurvival curveestimationwithcurrentstatusdataon {Δ,�} iswell established inthestatistical liter- ature with several available parametric, semi-parametric and non-parametric procedures (vardi, ; huang, .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / ; van der laan and robins, ; van der laan and jewell, ; lin and others, , e.g.). for example, van der laan and robins ( ) proposed a non-parametric, locally efficient estimator via inverse probability of censoring weighting (ipcw), assuming that ( )) and � are conditionally independent given some informa- tive baseline covariates z ⊂ w (e.g. age, sex, etc.) and ( ) a consistent estimator for the conditional density of � | ` is available.however, theseexistingestimatorsdonot leverageunlabeledehrfeature informationsuch as time to first surrogate icd code, which may greatly improve risk estimation precision. sincewmaybehighlypredictiveof), theestimationof((c)canpotentiallybeimprovedviasemi-supervised learning (ssl) leveraging both the small set of Δ observations in the labeled set and the ehr features w in the unlabeledset. sslhasbeenshowntosignificantlymitigatebiasand/or improveefficiency forvariousriskpre- diction applications (chai and others, ; liang and others, ; bair and tibshirani, ; golub and others, ). for example, several studies employ semi-parametric models to impute event times in the unlabeled set for subsequent input into an outcome survival model alongside labeled data (chai and others, ; liang and others, ; zhao and others, ; uno and others, ; hassett and others, ; chubak and others, ; choi and others, ; kaji and others, ; ruan and others, ; ahuja and others, a). while such imputa- tion approaches may improve efficiency under correct specification of the imputation model, they are subject to significant bias if the imputation model is misspecified. in addition, these existing methods do not allow for useofcurrentstatus labels fortraining.othergeneralaugmentedinverseprobabilityweightingmethodsinthe missing data literature (seaman and white, ; rotnitzky and robins, , e.g.) are not directly applicable here since the probabilities of labels being observed tend to zero in the ssl setting. we address this shortcoming by proposing semi-supervised calibration of risk with noisy event times (scornet) for estimation of ((c). scornet utilizes current status labels while also employing a robust semi- supervised imputation approach on the extensive unlabeled set to maximize survival estimation efficiency. to mitigateimputationbiasandmaximizeefficiencygainfromtheunlabeleddata,scornetutilizesahighlyflexi- .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / blesemi-non-parametrickernelregressionmodelwithehrfeaturesascovariates,whichensuresthevalidityof the resulting risk estimator without requiring the imputation model to hold. in addition to providing theoret- ical justifications for the scornet estimator, we illustrate via simulation studies that scornet substantially outperforms existing methods with regards to the bias-variance tradeoff. the rest of the paper is organized as follows. in section , we detail the scornet procedure along with its associated inference procedures. in section , wereport riskestimation performancerelative toexisting methods in diversesimulation studies.to further illustrate the utility of scornet in clinical applications, we apply it to a real-world ehr study estimat- ing the risk of heart failure among rheumatoid arthritis patients in section . finally, in section we briefly discuss the strengths, weaknesses, and potential applications of scornet. methods . setup let) denotetheeventtimeforwhichweareinterestedinestimatingacumulativedistributionfunction �(c) = %() ≤ c) and survival function ((c) = −�(c). in the ehr study we do not observe) but ratherΔ = �() ≤ �) for a small labeled subset, where � is the follow up time with finite support [ , g ]. for all subjects, we also observe a set of baseline covariates ` and longitudinal ehr features `. since codes used in the ehr are often highlysensitivebutnotspecific, thereoftenexistssomefiltervariablef ∈ { , } suchthatΔ | (f = ,w ) = almostsurely,wherew = (`t , , ` t )t.moreover,weassumethat (), `)� | ` .weassumethatdataforanalysis consistofasmall setof= current-status-labeledobservationsrandomlyselectedamongthosewithf = along with a larger set of # unlabeled observations:d = {d = (� ,+ Δ ,w ,+ ,f )t, = , ..., #} = l∪u, where l = {(� ,Δ ,w , , )t : f = ,+ = , = , ...,=} andu = {(� , ,w , ,f )t : + = , = = + , ..., #} with log(#)/log(=) → a > / as = →∞. .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / sincethecensoring� maydependon ` ,we followthe ipcwstrategyofvanderlaanandrobins ( ) to weight observations by lc, (� | ` ) = (� − c) (c | ` ) where (b) = (b/ )/ , (c | ` ) = � (c | ` )/ c, � (c | ` ) = %(� ≤ c | ` ), (·) is some symmetric density function, and < = $(=−a) is the bandwidth thereof with a ∈ ( / , / ]. ipcw enables consistent estimation of functionals of) ≤ c and w since for any reasonable choice of function @(·) and , ∈ { , }, � { Δ @(w )f lc, (� | ` , ) } = � { �() ≤ c) @(w )f } +$( ). ( ) the ipcw estimator for c(c) = %() ≤ c | f = ) proposed by van der laan and robins ( ) essentially corresponds to (c) = ∑= = f Δ lc, (� | ` , )∑= = f lc, (� | ` , ) with (c | ` ) in lc, (� | ` , ) replaced by a consistent estimator that converges faster than =− / , which is not difficult to achieve under reasonable modeling assumptions since � | ` , can be estimated using the full data d. to this end, we propose to derive an estimator for the conditional density (c | ` ) = _ (c | ` , )( (c | ` , )byimposingasemi-parametricmodel for� | ` .althoughmanycommonlyemployedmodels can be used since once again � is fully observed for all patients, we illustrate our proposal by focusing on the cox proportional hazards model (cox, ) under which _ (c | ` , ) = _ (c) $ t` , and ( (c | ` , ) ≡ −� (c | ` , ) = exp { −Λ (c) $ t` , } , ( ) where _ (c | ` , ) is the conditional hazard function of � | ` , , _ (c) is the unknown baseline hazard func- tion, Λ (c) = ∫ c _ (b) b, and $ is the vector of unknown covariate effects. . scornet estimation as outlined in figure , scornet consists of three steps: ( ) estimating the conditional censoring distribution ℎ (c | ` )usingd; ( )fittingan imputationworkingmodel for c(c | w) ≡ %() ≤ c | w,f = )usingl, denoting .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / theestimateof c(c | w) as (c | w); and ( )estimating ((c) bymarginalizing (c | w)f+Δ ( −f) = (c | w)fvia ipcw. figure : schematic of the scornet algorithm. . . step : estimate (c | ` ) under the cox model for � | ` toestimate (c | ` ),wefitthecoxmodel ( ) tothefulldatad toobtainthepartial likelihoodestimator $̂ for $. we subsequently estimate Λ (c) and _ (c) respectively as the standard breslow estimator Λ̂ (c) and the kernel-smoothed breslow estimator _̂ (c) (basha and hoxha, ), where Λ̂ (c) = #∑ = �(� ≤ c)∑# = �(� ≥ � )exp ( $̂t` , ) , _̂ (c) = #∑ = # ( � − c )∑# = �(� ≥ � )exp ( $̂t` , ) , and # = $(#−a ) with a ∈ ( / , / ]. we then obtain _̂ (c | ` ) = _̂ (c) $̂ t` , and (̂ (c | ` ) = exp { −Λ̂ (c) $̂ t` , } , and we estimate (c | ` , ) as ̂ (c | ` , ) = _̂ (c | ` , )(̂ (c | ` , ). following standard asymptotic results for non-parametric kernel regression (pagan and ullah, ), it is not difficult to show that sup` ,c | ̂ (c | .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / ` ) − (c | ` )| = $?{log(#) / (# #)− / } = >?(). we denote the resulting estimate for the censoring weight as l̂ =(c | ` , ) = =(� − c)/ ̂ (c | ` , ). . . step : estimate an imputation model c(c | w ) ≡ % () ≤ c | w ,f = ) to leverage the unlabeled data, we fit a flexible imputation working model c(c | w ) = { "(c) + # (c) t ®̀ , + #(c)t` } = { )(c)t ®w } ( ) where ` denotes theehrsurrogate features, )(c) = ( u(c), # (c)t, #(c)t )t, and ®w = ( , ®̀t , , `t )t.under ( ), %() ≤ c | w ,f = ) = { )(c)t ®w } , and hence we may estimate )(c) as )̂(c) = ( û(c), #̂ (c)t, #̂(c)t )t , the solution to the ipcw estimating equation evaluated withl, =∑ = l̂c, =(� | ` , )f ®w { Δ − ( )t ®w )} = , where = = $(=−a) with a ∈ ( / , / ). in practice, = can be chosen via either standard cross-validation or heuristicplug-invalues.forafutureobservationwithfilterstatusf = andcovariatesw ,weimpute �() ≤ c) as the conditional risk ĉ(c | w ) = { )̂(c)t ®w } . it is not difficult to show that )̂(c) converges in probability to )̄(c), the solution to the limiting estimating equation � [ ®w { �() ≤ c) − ()t ®w ) } | f = ] = , which ensures that � {c̄(c | w ) | f = } = %() ≤ c | f = ), where c̄(c | w ) = {)̄(c)t ®w }, ( ) regardless of the adequacy of the imputation model ( ). . . step : estimate �(c) by marginalizing imputed risks finally, we marginalize the imputed values ĉ c = ĉ(c | w ) ∀f = and Δ = ∀f = to estimate �(c). since f depends on � , we again employ ipcw to marginalize ĉ cf +Δ ( −f ) = ĉ cf and thereby construct our .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / final estimator for �(c): �̂(c) = ∑# = {ĉ cf +Δ ( −f )}l̂c, # (� | ` , )∑# = l̂c, # (� | ` , ) = ∑# = ĉ cf l̂c, # (� | ` , )∑# = l̂c, # (� | ` , ) . . inference for �̂(c) following standard theory for non-parametric kernel regression (pagan and ullah, ), we show in the sup- plementary materials that �̂(c) → %() ≤ c,f = ) + %() ≤ c,f = ) = %() ≤ c) = �(c) in probability under mild regularity conditions and correct specification of the censoring model regardless of the adequacy of the imputation model. here, we note that for any c ∈ [ , g ], = %(Δ = | f = ,� = c,w ) implies that %() ≤ c | f = ) = . furthermore, (= =) / { �̂(c) −�(c) } = ( = = ) / =∑ = l̂c, =(� | ` , )f {Δ − c̄(c | w )} +>?( ) = ( = = ) / =∑ = lc, =(� | ` , )f {Δ − c̄(c | w )} +>?( ) since supc | ̂ (c | ` ) − (c | ` )| = >?(). it follows that (= =) / {�̂(c) −�(c)} is asymptotically normal with mean and variance f (c) = '( )�{v(c | ` , )/ (c | ` , )}, ( ) where v(c | ` , ) = �[f {�() ≤ c) − c̄(c | w )} | ` , ] and '( ) = ∫ (g) g. our derivation for the asymptotic distribution of �̂(c) can effectively ignore the vari- ability associated with the estimation of censoring weights, which simplifies the asymptotic variance f (c). importantly, f (c) decreases as the imputation model approximates c(c | w ) better since v(c | ` , ) = �[f {�() ≤ c) − c(c | w )} | ` , ] + �[f {c(c | w ) − c̄(c | w )} | ` , ] decreases. to estimate f (c) in practice, one may construct a plug-in estimator, f̂ (c) = = = =∑ = l̂c, =(� | ` , ) f { Δ − ( )̂(c)t ®w )} . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / by contrast, the supervised ipcw estimator that incorporates filter negative patients takes the form �̂(c) = ∑# = {ĉ(c)f +Δ ( −f )}l̂c, # (� | ` , )∑# = l̂c, # (� | ` , ) = ∑# = ĉ(c)f l̂c, # (� | ` , )∑# = l̂c, # (� | ` , ) . the asymptotic variance of (= =) / {�̂(c) −�(c)} is then f (c) = '( )�{v(c | ` , )/ (c | ` , )}, where v(c | ` , ) = � [ f {�() ≤ c) −c(c)} | ` , ] . the variance f (c) is equivalent to that of scornet if and only if the feature set w is uninformative for) (i.e. )w). supervised ipcwisotherwise lessefficient,withrelativeefficiencycontrolledbytherelativemagnitudes of the marginal error �[{�() ≤ c) − �(c)} | f = , ` , ] and the conditional error �[{�() ≤ c) − c̄(c | w )} | f = , ` , ]. simulation study we conduct extensive simulation experimentation to evaluate the finite sample performance of the proposed scornet estimator in realistic settings with = ∈ { , } observed labels within the set of filter-positive patients, defining the filter to have % sensitivity and % specificity for Δ. we compare scornet to three existing survival function estimators with current status data: ) parametric weibull accelerated failure time (aft) regression with interval event times (lin andothers, ), ) semi-parametric cox proportional hazards regression with interval event times and breslow baseline hazard estimation (huang, ; cox, ; breslow, ), and )non-parametric ipcwestimation(vanderlaanandrobins, ).weincorporate thefilter inthe weibull and cox models by setting Δ | (f = ) = and weighting the = labeled filter-positive patients by = ∑# = f . weibull and cox are implemented using the icenreg package in r, while ipcw is implemented per the algorithm detailed in van der laan and robins ( ), estimating � | ` usingdunder the cox model. we note that estimating the censoring distribution usinglyields similar asymptotic performance to usingd, but in finite sample settings the latter offers higher efficiency. .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / setting ` ∼ � | ` ∼ ) | ` ∼ ` ∼ unif(− , ) weibull ( − . ` , ) weibull ( − . ` , ) normal{) + ,f(�)/ } unif(− , ) weibull ( − . ` , ) weibull ( , ) normal{) + ,f(�)/ } unif(− , ) weibull ( − . ` , ) weibull ( − . ` , ) normal{) + ,f(�)/ } unif(− , ) weibull ( − . ` , ) logistic ( − ` , ) normal{) + ,f(�)/ } unif(− , ) weibull ( , ) weibull ( − . ` , ) normal{) + ,f(�)/ } unif(− , ) weibull ( − . ` , ) weibull ( − . ` , ) normal{) + ,f(�)/ } table : generative parameters employed in our simulation study. weconsider diversegenerativemechanismsasdetailedintable ,includingcaseswhereweibull-distributed accelerated failure time of ) | ` , proportional hazards of ) | ` , and proportional hazards of � | ` are re- spectively violated, as well as cases where scornet’s imputation model is and is not misspecified. in settings , , and , we consider various cases where scornet and all comparator methods are correctly specified. in setting we consider the specific case where � and ) both depend on ` , and both � | ` and ) | ` are weibull-distributed satisfying accelerated failure time and proportional hazards. in setting , by contrast, we consider a case where )` to assess robustness to over-parametrization of this relationship, and in setting we consider a case where �` to evaluate robustness to over-parametrization thereof. in settings and we assess the benefit of scornet and ipcw’s robustness to the distribution of ) | ` when this distribution sat- isfies neither weibull accelerated failure time nor proportional hazards. we evaluate scornet’s sensitivity to misspecification of the imputation model in settings , , and , as compared to correct specification thereof in settings and . finally, in setting we assess the sensitivity of scornet and ipcw to misspecification of theconditionalcensoringmodel� | ` . foreachgivenconfiguration,wecomputetheempiricalbias, standard error, and root mean squared error (rmse) of all estimators for �(c) based on their average performance on .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / simulated datasets evaluated at equally-spaced time points c ∈ [&�( . ) + =,&�( . ) − =], where & denotes the quantile function of � under the configuration. we used plug-in bandwidths of = = b̂(�)=− / and # = b̂(�)#− / for the imputation (step ) and marginalization (step ) steps of scornet respectively, where b̂ is the empirical standard deviation of observed �. we present the performance of the estimators av- eraged over the selected time points using = = labels in figure . the performance at each time point can be found in supplementary figure , and time-averaged performance using = = labels can be found in supplementary table of the supplementary materials. figure : time-averaged empirical absolute biases (left), standard errors (second from left), root relative efficiencies (second from right), and relative rmses (right) of the weibull accelerated failure time (red), cox proportional hazards w/ breslow baseline (blue), supervised ipcw (green), and scornet estimators using weakly informative (purple) and strongly informative (orange) surrogates, in various simulated settings with = = observed current status labels. .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / as figure demonstrates, imputing using a strongly informative feature ` (scornet-strong) results in consistently higher efficiency than just using the weakly informative baseline ` (scornet-weak), which in turn is markedly more efficient than not leveraging the unlabeled set at all (ipcw). scornet makes minimal assumptions regarding the distribution of ) | ` , settling for non-parametric efficiency in exchange for en- hanced flexibility. by contrast, the weibull regression model fully parametrizes ) | ` , and the cox model assumes proportional hazards thereof, increasing efficiency at the expense of bias in the case of misspecifica- tion. as expected, weibull consistently achieves higher empirical efficiency than cox, which in turn is more efficient than ipcw across settings. notably, scornet consistently achieves empirical efficiency comparable to weibull and significantly higher than cox despite being far more flexible than both, again highlighting the efficiencygainedbyleveragingauxiliaryinformationtoimputeunobservedrisks.atthesametime,scornetis muchlesssusceptibletomodelmisspecificationbias thanweibull, asdemonstratedbythe latter’s significantly higherbiasandrmseinsetting . indeed,scornetachievesrelatively lowmeanabsolutebiasacrosssettings, with mse apparently dominated by variance rather than bias in the setting of - labels. consistent with the theory, scornet is robust to misspecification of the imputation model in settings , , and , achieving equivalently insignificantbiasas insetting andmarginallybutnotmeaningfullyhigherbias thaninsetting . that said, correctnessof the imputationmodel insettings and doesnotyieldanymeaningful change inrel- ative efficiency, likely because inherent variability functionally dominates imputation model bias given so few labels. reassuringly, scornet (and ipcw) appear insensitive to misspecification of � | ` in setting , achiev- ing functionally equivalent bias to the correctly-specified weibull and cox models. altogether, these results corroborate the assertion that scornet’s semi-supervised utilization of informative feature data to impute risks in the unlabeled set improves estimation efficiency without introducing bias regardless of the validity of the imputationmodel.moreover, theysuggest thatscornetisparticularlyvaluable insettingswhere ( )flex- .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / ibility is desired with regard to the distribution of) | ` , and ( ) there exists a large set of unlabeled patients with associated ehr data – both commonplace in retrospective observational studies. figure :empiricalcoverageprobabilitiesaveragedovertime(left)andplottedovertime(right)ofscornet-strong’s % confidence intervals constructed with the bootstrap (red) and plug-in (blue) standard error estimators in various simu- lated settings with = = observed current status labels. see table for details of the generative mechanism employed in each setting. toassessthefinitesampleperformanceoftheproposedintervalestimationprocedures,weobtainstandard errorestimatesbothusingtheproposedplug-inestimator f̂(c) andviabootstrapwith replicates. infigure we demonstrate empirical coverage probabilities of scornet’s % wald confidence intervals constructed using each standard error estimator, both averaged over the selected timepoints (left) and at each timepoint (right). reassuringly, we find that the % confidence intervals using both plug-in and bootstrap estimators achievenearly % meancoverageacrosssettings.coverageonlydropsbelow % at thetailsof theevent time .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / support due to moderately increased bias from kernel smoothing thereabout. the plug-in estimator achieves marginally lower coverage than the bootstrap estimator at the right tail due to underestimation of the true standard error, likely because of overfitting of the imputation model given low local censoring density (and thus low effective #). notably, we do not observe this trend in setting , wherein correct specification of the imputation model obviates overfitting. thus, we posit that the plug-in estimator can be reliably used for fi- nite sample problems with = ∈ [ , ] labels as long as the conditional censoring density (� | ` ) is sufficiently high and the timepoints evaluated are sufficiently far from the tails of the event time support. application to assessing heart failure risk among rheumatoid arthritis patients rheumatoidarthritis (ra),achronicinflammatorydiseasethataffectsapproximately % ofthegeneralpopula- tion, is associated with dramatically increased risk of heart failure (hf) morbidity and mortality (kaplan, ; nicola andothers, , ; ahlers andothers, ). one study estimated that ra patients have a . -fold life- time risk of developing hf compared to matched ra-negative controls (nicola andothers, ), while another estimated that hf accounts for % of excess mortality among ra patients (nicola and others, ). ongoing interest lies in estimating the risk of developing hf subtypes in ra cohorts and quantifying the risk modifying effect of various ra treatments (ahlers and others, ). due to the increased availability of electronic health record (ehr) data, it is now possible to assess hf risk for a broader ra population using these data. for ex- ample, at mass general brigham we previously established an ehr cohort of # = , ra patients (huang and others, ). this large ra cohort can potentially be used to study the longitudinal risk of hf among ra patients. however, such analysis is not straightforward as hf status is not readily available within the ra cohort. we propose to estimate hf risk among ra patients by leveraging ( ) = current status labels on hf status obtained .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / via manual chart review, and ( ) informative yet unlabeled ehr data, including time to first icd code for hf, as surrogate variables `. we estimate both the age-specific hf risk, �age(·), and the risk of developing hf after the patient’s incident icd code for ra ( ), �ra+(·), among patients with at least months of follow up whose incident ra codes occur after the age of to select for adult-onset as opposed to juvenile ra. among filter- positive patients, defined as having at least icd code for hf, we have = = labels on censoring time hf status Δ for age-specific hf risk, and we have = = for post-ra hf risk. we let the baseline covariates ` includesexanddecadeoffirstehreventfor �age(·), andsex,decadeoffirstracode,andageatfirstracodefor �ra+(·).weobtainhfriskestimatorsbasedonscornetaswellas theaforementionedcomparatorestimators. for the imputation model in step of scornet, we consider three ehr-derived surrogate risk predictors for `: ( ) thepredictedΔ basedontheunsupervisedmultimodalautomatedphenotyping(map)algorithm,which uses the total counts of hf icd codes and mentions of hf in clinical notes, as well as the total count of all icd codesasahealthcareutilizationmeasure (liaoandothers, ), ( ) thepredictedΔ basedontheunsupervised surrogate-guidedensemblelatentdirichletallocation(surelda)algorithm,which leverages thefeaturesused in map as well as additional manually-selected ehr features including counts of relevant medications, icd codes, and concept unique identifiers (cuis) in clinical notes (ahuja and others, b); and ( ) the time to first hf icd code. as in our simulation, we select plug-in bandwidths of = = b̂(�)=− / and # = b̂(�)#− / for the imputation and marginalization steps of scornet respectively, and we evaluate risk at timepoints c ∈ [&�( . ) + =,&�( . )− =]. we again compare the performance of scornet to that of weibull, cox, and ipcw,incorporatingthefilter intheweibullandcoxmodelsbypropensityweightingaswedointhesimulation study. in figure , we show the estimated hf risk curves along with their standard errors. reassuringly, all meth- ods appear to agree rather closely for estimation of both age-specific hf risk and hf risk after ra diagnosis. for the latter quantity, however, weibull and cox appear to underfit while ipcw appears to overfit relative to .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / figure :estimatedage-specificandpost-racumulativerisksofheart failure(top)andbootstrapstandarderrorsthereof (bottom)overtimeoftheweibullacceleratedfailuretime(red,short-long-dashed),coxproportionalhazardsw/breslow baseline (blue, dot-dashed), supervised ipcw (green, dashed), and scornet (purple, solid) estimators. thescornetestimator,whichappears toachieveareasonablemiddleground.moreover,scornetonceagain attainsstandarderrorscomparable to thoseof theweibull estimatorandmeaningfully lowerthanthoseof the cox and ipcw estimators. this suggests that while the weibull and cox models potentially fail to capture the complexity of the post-ra hf risk function, and ipcw is too unstable for a limited labeled set of size = = , scornet offers an attractive balance of efficiency and flexibility and is thus well conditioned for such a sce- nario. as shown in figure , averaged over the selected timepoints, the root relative efficiency of scornet is . , . , and . compared to the weibull, cox, and ipcw estimators respectively for estimation of age- specific risk, and . , . , and . respectively for estimation of hf risk after ra diagnosis. once again, the fact that scornet achieves efficiency moderately higher than the relatively inflexible weibull model and sig- .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / nificantly higher than the coxand ipcw estimators reflects the value of leveraging available information from the ehr to bolster risk estimation efficiency. figure : time-averaged bootstrap standard errors (left) and empirical root relative efficiencies (right) of the weibull acceleratedfailuretime(red),coxproportionalhazardsw/breslowbaseline(blue), ipcw(green),andscornet(purple) estimators for estimation of ( ) age-specific hf risk (left), and ( ) hf risk after ra diagnosis (right), among ra patients in the partners ehr database. discussion by leveragingasizeableunlabeleddatasetcontaining imperfect surrogatesof the trueevent timesandasmall set with observed current status labels, the scornet estimator serves as a robust and efficient alternative to existing model-free survival estimators with current status data. the semi-supervised nature of scornet makes it well-conditioned to ehr-based survival estimation in settings where only a limited number of labels are available or readily obtainable. moreover, by only requiring current status labels rather than the precise .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / timing of event onset, scornet greatly reduces the burden of chart review and increases the feasibility of studying disease risk using ehr data. to allow for covariate-dependent censoring, which is frequently present in observational settings, scor- net requires additional assumptions on the distribution of � | ` . although we choose the proportional haz- ards model for illustration, any standard semi-parametric model will yield similar properties for the resulting estimator. since {�, ` } are observed for all subjects, one can potentially allow for more flexible (i.e. non- parametric) censoring models. that said, our simulation results suggest that scornet is relatively insensitive to misspecification of the model for � | ` . even under mild misspecification, it achieves consistently lower mean squared errors than existing estimators. when interest lies in assessing how risk differs across different patient sub-populations, it is straightfor- wardtoextendscornettoestimatesubgroup-specificrisks forasmallnumberof subgroups.however, future research is warranted to estimate covariate-specific risks for a general set of covariates. software an r package, including a sample use case and complete documentation, is available at https://cran.r-project.org/web/packages/scornet/index.html. source code can be found at https://github.com/celehs/scornet. funding thisworkwassupportedbytheu.s.national institutesofhealthgrantst -ar ,t -gm ,and r -ca . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / acknowledgements the authors declare no conflicts of interest. references ahlers, michael j., lowery, brandon d., farber-eger, eric, wang, thomas j., bradham, william, orm- seth, michelle j., chung, cecilia p., stein, c. michael and gupta, deepak k. ( ). heart failure risk associated with rheumatoid arthritis-related chronic inflammation. journal of the american heart association, . ahuja, yuri, hong, chuan, xia, zongqi and cai, tianxi. ( a). samgep: a novel method for prediction of phenotype event times using the electronic health record. preprint. ahuja, yuri, zhou, doudou, he, zeling, sun, jiehuan, castro, victor m, gainer, vivian, murphy, shawn n, hong, chuan and cai, tianxi. ( b). surelda: a multidisease automated phenotyping method for the electronic health record. journal of theamericanmedical informaticsassociation ( ), – . bair, eric and tibshirani, robert. ( ). semi-supervised methods to predict patient survival from gene expression data. plosbiology ( ), e . basha,luleandhoxha,fatmir. ( ). kernelestimationofthebaselinefunctioninthecoxmodel. european scientific journal ( ), – . breslow,normane. ( ). discussionofprofessorcox’spaper. journalof theroyalstatisticalsociety,seriesb , – . chai,hua,li,zi-na,meng,de-yu,xia,liang-yongandliang,yong. ( ). anewsemi-supervisedlearning model combined with cox and sp-aft models in cancer survival analysis. scientificreports ( ). .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / choi, edward, du, nan, chen, robert, song, le and sun, jimeng. ( ). constructing disease network and temporal progression model via context-sensitive hawkes process. ieee computer society. pp. – . chubak, jessica, yu, onchee, pocobelli, gaia, lamerato, lois, webster, joe, prout, marianna n, yood, marianne ulcickas, barlow, william e and buist, dianna sm. ( ). administrative data algorithms to identify second breast cancer events following early-stage invasive breast cancer. journal of the national cancer institute ( ), – . cipparone, charlotte w, withiam-leitch, matthew, kimminau, kim s, fox, chet h, singh, ranjit and kahn, linda. ( ). inaccuracy of icd- codes for chronic kidney disease: a study from two practice-based research networks (pbrns). the journal of theamericanboardof familymedicine ( ), . cox,davidr. ( ). regressionmodelsandlife-tables. journalof theroyalstatisticalsociety.seriesb , – . dean, bonnie b, lam, jessica, natoli, jaime l, butler, qiana, aguilar, daniel and nordyke, robert j. ( ). use of electronic medical records for health outcomes research: a literature review. medical care researchandreview ( ), – . golub,t.r., slonim,d.k.,tamayo,p.,huard,c.,gaasenbeek,m.,mesirov, j.p.,coller,h.,loh,m.l.,down- ing, j.r., caligiuri, m.a., bloomfield, c.d. and others. ( ). molecular classification of cancer: class dis- covery and class prediction by gene expression monitoring. science ( ), – . hassett, michael j, uno, hajime, cronin, angel m, carroll, nikki m, hornbrook, mark c and ritzwoller, debra. ( ). detecting lung and colorectal cancer recurrence using structured clini- cal/administrativedatatoenableoutcomesresearchandpopulationhealthmanagement.medicalcare ( ), e –e . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / hodgkins, adam j, bonney, andrew, mullan, judy, mayne, darren john and barnett, stephen. ( ). survival analysis using primary care electronic health record data: a systematic review of the literature. health informationmanagement journal ( ), – . hripcsak, george and albers, david j. ( ). next-generation phenotyping of electronic health records. journal of theamericanmedical informaticsassociation ( ), – . huang, jian. ( ). efficient estimation for the proportional hazards model with interval censoring. the annals of statistics ( ), – . huang, sicong, huang, jie, cai, tianrun, dahal, kumar p, cagan, andrew, he, zeling, stratton, jack- lyn, gorelik, isaac, hong, chuan, cai, tianxi and others. ( ). impact of icd and secular changes on electronic medical record rheumatoid arthritis algorithms. rheumatology. kaji, deepak a, zech, john r, kim, jun s, cho, samuel k, dangayach, neha s, costa, anthony b and oer- mann, eric k. ( ). an attention based deep learning model of clinical events in the intensive care unit. plosone ( ), e . kaplan, mariana j. ( ). cardiovascular complications of rheumatoid arthritis - assessment, prevention. and treatment. rheumaticdiseaseclinics ofnorthamerica ( ), – . kohane, isaac s, churchill, susanne e and murphy, shawn n. ( ). a translational engine at the na- tional scale: informatics for integrating biology and the bedside. journal of the american medical informatics association ( ), – . liang, yong, chai, hua, liu, xiao-ying, xu, zong-ben, zhang, hai and leung, kwong-sak. ( ). cancer survival analysis using semi-supervised learning method based on cox and aft models with l / regulariza- tion. bmcmedicalgenomics ( ), . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / liao, katherine p, sun, jiehuan, cai, tianrun a, link, nicholas, hong, chuan, huang, jie, huffman, jennifer e, gronsbell, jessica, zhang, yichi, ho, yuk-lam, castro, victor, gainer, vivian, murphy, shawnn,o’donnell,christopherj,caziano,jmichael,cho,kelly,szolovits,peter,kohane,isaacs, yu, sheng and others. ( ). high-throughput multimodal automated phenotyping (map) with application of phewas. journal of theamericanmedical informaticsassociation ( ), – . lin,hung-mo,williamson,johnmandkim,hae-young. ( ). firthadjustmentforweibullcurrent-status survival analysis. communications instatistics -theoryandmethods ( ), – . liu, bin, li, ying, sun, zhaonan, ghosh, soumya and ng, kenney. ( ). early prediction of diabetes com- plicationsfromelectronichealthrecords:amulti-tasksurvivalanalysisapproach. in:the ndaaaiconference onartificial intelligence. association for the advancement of artificial intelligence. pp. – . miotto, riccardo, li, li, kidd, brian a and dudley, joel t. ( ). deep patient: an unsupervised represen- tation to predict the future of patients from the electronic health records. scientificreports ( ), . nicola,pauloj.,crowson,cynthias.,maradit-kremers,hilal,ballman,karlav.,roger,veroniquel., jacobsen, steven j. and gabriel, sherine e. ( ). contribution of congestive heart failure and ischemic heart disease to excell mortality in rheumatoid arthritis. arthritis rheumatology ( ), – . nicola, paulo j., maradit-kremers, hilal, roger, veronique l., jacobsen, steven j., crowson, cyn- thia s., ballman, karla v. and gabriel, sherine e. ( ). the risk of congestive heart failure in rheuma- toid arthritis: a population-based study over years. arthritis rheumatology ( ), – . pagan, adrian and ullah, aman. ( ). nonparametric econometrics. cambridge university press. panahiazar, maryam, taslimitehrani, vahid, pereira, naveen and pathak, jyotishman. ( ). using ehrs and machine learning for heart failure survival analysis. studies inhealthtechnologyand informatics , – . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / rotnitzky, andrea and robins, james m. ( ). inverse probability weighting in survival analysis. wiley statsref: statisticsreferenceonline. ruan, tong, lei, liqi, zhou, yangming, zhai, jie, zhang, le, he, ping and gao, ju. ( ). representation learning for clinical time series prediction tasks in electronic health records. bmc medical informatics and decisionmaking ( ). seaman, shaun r and white, ian r. ( ). review of inverse probability weighting for dealing with missing data. statisticalmethods inmedical research ( ), – . steele, andrew j, denaxas, spiros c, shah, anoop d, hemingway, harry and luscombe, nicholas m. ( ). machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. plosone ( ), e . uno,hajime,ritzwoller,debrap,cronin,angelm,carroll,nikkim,hornbrook,markcandhassett, michaelj. ( ). determiningthetimeofcancerrecurrenceusingclaimsorelectronicmedicalrecorddata. jcoclinicalcancer informatics , – . van der laan, mark j and jewell, nicholas p. ( ). current status and right-censored data structures when observing a marker at the censoring time. theannals of statistics ( ), – . van der laan, mark j and robins, james m. ( ). locally efficient estimation with current status data and time-dependent covariates. journal of theamericanstatisticalassociation ( ), – . vardi, y. ( ). nonparametric estimation in the presence of length bias. annals of statistics , – . zhao, yue, herring, amy h, zhou, haibo, ali, mirza w and koch, gary g. ( ). a multiple imputation methodforsensitivityanalysesof time-to-eventdatawithpossibly informativecensoring. journalofbiophar- maceutical statistics ( ), – . .cc-by-nc . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc/ . / a global cancer data integrator reveals principles of synthetic lethality, sex disparity and immunotherapy. christopher yogodzinski , ,#*, abolfazl arab - , justin r. pritchard , hani goodarzi - , luke a. gilbert , , * department of urology, university of california, san francisco, san francisco, ca, usa helen diller family comprehensive cancer center, san francisco, san francisco, ca, usa department of biochemistry and biophysics, university of california, san francisco, ca, usa department of biomedical engineering, pennsylvania state university, university park, pa department of cellular & molecular pharmacology, university of california, san francisco, ca, usa # current address: university of north carolina chapel hill school of medicine, chapel hill, nc, usa *corresponding authors correspondence: cyogodzi@unc.edu (c.y.), luke.gilbert@ucsf.edu (l.a.g) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract advances in cancer biology are increasingly dependent on integration of heterogeneous datasets. large scale efforts have systematically mapped many aspects of cancer cell biology; however, it remains challenging for individual scientists to effectively integrate and understand this data. we have developed a new data retrieval and indexing framework that allows us to integrate publicly available data from different sources and to combine publicly available data with new or bespoke datasets. beyond a database search, our approach empowered testable hypotheses of new synthetic lethal gene pairs, genes associated with sex disparity, and immunotherapy targets in cancer. our approach is straightforward to implement, well documented and is continuously updated which should enable individual users to take full advantage of efforts to map cancer cell biology. introduction large scale but often independent efforts have mapped phenotypic characteristics of more than one thousand human cancer cell lines. despite this, static lists of univariate data generally cannot identify the underlying molecular mechanisms driving a complex phenotype. we hypothesized that a global cancer data integrator that could incorporate many types of publicly available data including functional genomics, whole genome sequencing, exome sequencing, rna expression data, protein mass spectrometry, dna methylation profiling, chip- seq, atac-seq, and metabolomics data would enable us to link disease features to gene products – . we set out to build a resource that enables cross platform correlation analysis of multi-omic data as this analysis is in and of itself is a high-resolution phenotype. multi-omic analysis of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . functional genomics data with genomic, metabolomic or transcriptomic profiling can link cell state or specific signaling pathways to gene function , , , – . lastly, co-essentiality profiling across large panels of cell lines has revealed protein complexes and co-essential modules that can assign function to uncharacterized genes . problematically, in many cases publicly available data are poorly integrated when considering information on all genes across different types of data and the existing data portals are inflexible. for example, lists of genes cannot be queried against groups of cell lines stratified by mutation status or disease subtype. furthermore, one cannot integrate new data derived from individual labs or other consortia. we created the cancer data integrator (candi) which is a series of python modules designed to seamlessly integrate genomic, functional genomic, rna, protein and metabolomic data into one ecosystem. our python framework operates like a relational database without the overhead of running mysql or postgres and enables individual users to easily query this vast dataset and add new data in flexible ways. this was achieved by unifying the indices of these datasets via index tables that are automatically accessed through candi’s biologically relevant python classes. we highlight the utility of candi through four types of analysis to demonstrate how complex queries can reveal previously unknown molecular mechanisms in synthetic lethality, sex disparity and immunotherapy. these data nominate new small molecule and immunotherapy anti-cancer strategies in kras-mutant colon, lung and pancreatic cancers. results candi is a global cancer data integrator. we set out to integrate three types of data by creating programmatic and biologically relevant abstractions that allow for flexible cross referencing across all datasets. data from the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cancer cell line encyclopedia (ccle) for rna expression, dna mutation, dna copy number and chromosome fusions across more than cancer cells lines was integrated into our database with the functional genomics data from the cancer dependency map (depmap) (fig. a,b and supplementary fig. ) , , . we also integrated protein-protein interaction data from the corum database along with three additional distinct protein localization databases , , , . candi by default will access the most recent release of data from depmap although users can also specify both the release and data type that is accessed. the key advantage to this approach is that candi enables one to easily input user defined queries with multi-tiered conditional logic into this large integrated dataset to analyze gene function, gene expression, protein localization and protein-protein interactions. candi identifies genes that are conditionally essential in brca-mutant ovarian cancer. the concept that loss-of-function tumor suppressor gene mutations can render cancer cells critically reliant on the function of a second gene is known as synthetic lethality. despite the promise of synthetic lethality, it has been challenging to predict or identify genes that are synthetic lethal with commonly mutated tumor suppressor genes. while there are many underlying reasons for this challenge, we reasoned that data integration through candi could identify synthetic lethal interactions missed by others. a paradigmatic example of synthetic lethality emerged from the study of dna damage repair (ddr) . somatic mutations in the dna double-strand break (dsb) repair genes, brca / , create an increased dependence on dna single strand break (ssb) repair. this dependence can be exploited through small molecule inhibition of parp mediated ssb repair. inhibition of parp provides significant clinical responses in advanced breast and ovarian cancer (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . patients but they ultimately progress . thus, new synthetic lethal associations with brca / are a potential path towards therapeutic development parp refractory patients. to illustrate the flexibility of candi to mine context specific synthetic sick lethal (ssl) genetic relationships we hypothesized that the genes that modulate response to a parp inhibitor might be enriched for selectively essential proliferation or survival of brca / -mutant cancer cells. to test this hypothesis, we integrated the results of an existing crispr screen that identified genes that modulate response to the parp inhibitor olaparib . we then tested whether any of these genes are differentially essential for cell proliferation or survival in ovarian cancer and in breast cancer cell models that are either brca / proficient or deficient (fig. c,d). this query revealed that the fanconi anemia pathway is selectively essential in brca / -mutated ovarian cancer models but not in brca / -wild type ovarian cancer, brca / -mutated breast cancer or brca / -wildtype breast cancer models (fig. e and supplementary table ). to our knowledge a ssl phenotype between fancm and brca / has never been reported although a recent paper nominated a role for fancm and brca in telomere maintenance . importantly, fancm is a helicase/translocase and thus considered to be a druggable target for cancer therapy . clinical genomics data support this ssl hypothesis although this remains to be tested in ovarian cancer patient samples . because the depmap currently only allows single genes to be queried and does not enable users to easily stratify cell lines by mutation such analysis would normally take a user several days to complete manually. our approach enabled this analysis to be completed using a desktop computer in less than two hours, which includes the visualization of data presented here (fig. e). figure . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . (a) a schematic showing human cell models integrated by candi. (b) a schematic illustrating types of data integrated by candi. (c) a cartoon of a genome-scale crispri screen to identify genes that modulate response to parp inhibition by olaparib. (d) a schematic depicting data feature inputs parsed by candi. (e) essentiality of fanconi anemia genes in ovarian and breast cancer cell lines separated by brca mutation status. a bayes factor score of gene essentiality is displayed by a heat map. n= brca / -mutant ovarian cancer, n= brca-wildtype ovarian cancer, n= brca / -mutant breast cancer, n= brca / -wildtype breast cancer. conditional genetic essentiality in kras- and egfr- mutant nsclc cells. beyond tsgs, many common driver oncogenes such as krasg d are currently undruggable, which motivates the search for oncogene specific conditional genetic dependencies. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we reasoned that candi enables us to rapidly search functional genomics data for genes that are conditionally essential in lung cancer cells driven by kras- and egfr-mutations. we stratified non-small cell lung cancer cell (nsclc) models by egfr and kras mutations and then looked at the average gene essentiality for all genes within each of these subtypes of nsclc. we observed that kras is conditionally self-essential in kras-mutant cell models but that no other genes are conditionally essential in kras-mutant, egfr-mutant, kras-wildtype or egfr-wildtype cell models (fig. a,b and supplementary table ). this finding demonstrates that very few---if any--- genes are synthetic lethal with kras- or egfr- in kras- and egfr- mutant lung cancer cell lines. it may be that these experiments are underpowered or it may be that when the genetic dependencies of diverse cell lines representing a disease subtype are averaged across a single variable (e.g. a kras-mutation) very few common synthetic lethal phenotypes are observed . candi provides potential solutions for both of these hypotheses. candi enables a global analysis of conditional essentiality in cancer. it is thought that data aggregation across vast landscapes of unknown co-variates does not necessarily increase the statistical power to identify rare associations . thus, the global analyses of aggregated cancer data sometimes lies in systematically sub setting data based on key co- variates post aggregation. this has been observed in driver gene identification . inspired by our analysis of tsg and oncogene conditionally essentiality above, we next used candi to identify genes that are conditionally essential in the context of several hundred cancer driver mutations. we first grouped driver mutations (e.g. nonsense or missense) for each driver gene. for this analysis, we selected several thousand genes that are in the - th percentile of essentiality within the depmap data and therefore conditionally essential, meaning these genes are required (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . for cell growth or survival in a subset of cell lines. importantly, it is not known why these several thousand genes are conditionally essential. we then tested whether each of these conditionally essential genes has a significant association with individual driver mutations. our analytic approach does not weight the number of cell models representing each driver mutation nor does this give information on phenotype effect sizes. our analysis nominates a large number of conditionally dependent genetic relationships with both tsg and oncogenes (fig. c,d and supplementary table ). a number of the conditional genetic dependencies identified in our independent variable analysis above are represented by a limited number of cell models and so further investigation is needed to validate these conditional dependencies, but this data further suggests that averaging genetic dependencies across diverse cell lines with un-modeled covariates obscures conditional ssl relationships. to further investigate this hypothesis, we analyzed these same conditional genetic relationships with a second analytic approach that weights the number of cell models representing each driver mutation. we observed a limited number of conditional genetic dependencies that largely consists of oncogene self-essential dependencies as previously highlighted for kras-mutant cell lines (fig. e-g and supplementary table ) , . thus, analysis that averages each conditional phenotype across diverse panels of cell lines with unknown covariates masks interesting conditional genetic dependencies. figure . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . (a) average gene essentiality for kras and egfr in groups of nsclc cell lines stratified by kras mutation status or by both kras and egfr mutation status. n= for kras-wildtype shown in blue n= for kras-mutant shown in blue. n= for kras- wildtype egfr-wildtype shown in grey and n= for kras-mutant egfr-wildtype shown in grey. gene essentiality is an averaged bayes factor score for each group of cell lines. (b) average gene essentiality for kras and egfr in groups of nsclc cell lines stratified by egfr mutation status or by both egfr and kras mutation status. n= for egfr-wildtype shown in blue, n= for egfr-mutant shown in blue. n= for egfr-wildtype kras- wildtype shown in grey and n= for egfr-mutant kras-wildtype shown in grey. gene essentiality is an averaged bayes factor score for each group of cell lines. (c) p-values from chi tests of gene essentiality and nonsense mutations. (d) p-values from chi tests of gene essentiality and missense mutations. (e) a scatter plot showing effect size of the change in gene essentiality with select missense mutations and the -log (p-value) of each essentiality/mutation pair. (f) a scatter plot showing effect size of the change in gene essentiality with select nonsense mutations and the -log (p-value) of each essentiality/mutation pair. (g) a scatter plot showing effect size of the change in gene essentiality with all mutations and the -log (p-value) of each essentiality/mutation pair. candi reveals female and male context specific essential genes in colon, lung and pancreatic cancer. cancer functional genomics data is often analyzed without consideration for fundamental biological properties such as the sex of the tumor from which each cell line is derived. it is well established that biological sex influences cancer predisposition, cancer progression and response (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . to therapy . we hypothesized that individual genes may be differentially essential across male and female cell lines. this hypothesis to our knowledge has never been tested in an unbiased large-scale manner. to maximize our statistical power to identify such differences we chose to test this hypothesis in a disease setting with large number of relatively homogenous cell lines and fewer unknown covariates. using candi, we stratified all kras-mutant nsclc, pancreatic adenocarcinoma (pdac), and colorectal cancer (crc) by sex and then tested for conditional gene essentiality. this analysis identified a number of genes that are differentially essential in male or female kras-mutant nsclc, pdac and crc models (fig. a-f and supplementary table ). the genes that we identify are not common across all three disease types suggesting as one might expect that the biology of the tumor in part also determines gene essentiality. to test whether any association between differentially essential genes could be identified from expression data (e.g essential genes encoded on the y chromosome) we first used candi to identify genes that are differentially expressed between male and female cell lines within each disease . we then plotted the set of differentially essential genes against the differentially expressed genes in kras-mutant nsclc, pdac and crc models (fig. a,c,e and supplementary table ) and found little overlap between these gene lists. a number of genes that are more essential in male cells, such as ahcyl , eno , gpi and pkm, regulate cellular metabolism. this finding is consistent with previous literature on sex and metabolism . our analysis demonstrates that stratifying groups of heterogeneous cancer models by three variables, in this case tumor type, kras mutation status and sex, reveals differentially essential genes. candi enables biologically principled stratification of data in the ccle and depmap by any feature associated with a group of cell models. this stratification allows us to identify genes associated with sex, which is not possible with other covariates included. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . figure . (a) differential gene expression and differential gene essentiality in male and female crc cell lines. n= male cell lines and n= female cell lines. (b) the distribution of bayes factor gene essentiality scores in male and female crc cell lines. the top seven and bottom (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . three differentially essential genes are shown in violin plots split by the sex of the cell lines. (c) differential gene expression and differential gene essentiality in male and female nsclc cell lines. n= male cell lines and n= female cell lines. (d) the distribution of bayes factor gene essentiality scores in male and female nsclc cell lines. the top seven and bottom three differentially essential genes are shown in violin plots split by the sex of the cell lines. (e) differential gene expression and differential gene essentiality in male and female pdac cancer cell lines. n= male cell lines and n= female cell lines. (f) the distribution of bayes factor gene essentiality scores in male and female pdac cell lines. the top seven and bottom three differentially essential genes are shown in violin plots split by the sex of the cell lines. candi enables rapid integration of external datasets to reveal new immunotherapy targets. an emerging challenge in the cancer biology is how to robustly integrate larger “resource” datasets like ccle with the vast amount of published data from individual laboratories. for example, a big challenge in antibody discovery is identifying specific surface markers on cancer cells. to approach these big questions we utilized candis ability to rapidly take new datasets, such as raw rna-seq counts data in a disparate study of interest, then normalize and integrate this data into the ccle, depmap and protein localization databases previously described. specifically, we rapidly integrated an rna-seq expression dataset that measured the set of transcribed genes in primary lung bronchial epithelial cells from donors . classes within candi enable rapid application of deseq to assess the differential expression between outside datasets and the ccle. we used this feature to identify genes that are differentially expressed between primary lung bronchial epithelial cells and kras-mutant nsclc, egfr-mutant nsclc or all nsclc models in ccle. we then used candi to identify (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . genes that are upregulated in cancer cells over normal lung bronchial epithelial cells with protein products that are localized to the cell membrane. this analysis of kras-mutant, egfr-mutant and pan-nsclc generated highly similar lists of differentially expressed surface proteins (fig. a-f and supplementary table ). notably, overexpression of several of these genes, such as cd and cd , has been observed in lung cancer and is associated with poor prognosis – . these proteins represent potential new immunotherapy targets in kras-driven nsclc. figure . figure . (a) a graph showing genes that are upregulated in kras-mutant nsclc cell lines relative to primary human bronchial epithelial cells. a cell membrane protein localization score (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . is shown for each gene. higher protein localization scores indicate higher confidence annotations. (b) a scatter plot showing gene expression for genes that encode cell surface proteins in kras-mutant nsclc cell lines and primary human bronchial epithelial cells. n= for kras-mutant nsclc cell lines and n= for primary human bronchial epithelial cells. (c) a graph showing genes that are upregulated in egfr-mutant nsclc cell lines relative to primary human bronchial epithelial cells. a cell membrane protein localization score is shown for each gene. higher protein localization scores indicate higher confidence annotations. (d) a scatter plot showing gene expression for genes that encode cell surface proteins in egfr-mutant nsclc cell lines and primary human bronchial epithelial cells. n= for egfr-mutant nsclc cell lines and n= for primary human bronchial epithelial cells. (e) a graph showing genes that are upregulated in nsclc cell lines relative to primary human bronchial epithelial cells. a cell membrane protein localization score is shown for each gene. higher protein localization scores indicate higher confidence annotations. (f) a scatter plot showing gene expression for genes that encode cell surface proteins in nsclc cell lines and primary human bronchial epithelial cells. n= for nsclc cell lines and n= for primary human bronchial epithelial cells. discussion data integration is a critical requirement in biology research in the era of genomics and functional genomics. large scale efforts such as the ccle have revealed genomic features of more than cell line models. this data has not to our knowledge previously been integrated with functional genomics data in a manner that individual users can enter batched queries that are stratified by disease subtype or mutation status. this is not just a small improvement in functionality, but rather it is an enabling format that makes possible the types of conditional (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . genomics analyses that drive discovery. moreover, it fills a fundamental gap in the cancer research community that integrates large scale projects with investigator initiated studies our data framework enables biologists without specialized expertise in bioinformatics to use the full spectrum of data in the ccle and depmap in a higher throughput and precise manner. using candi, we identified genes that are selectively essential in male versus female kras-mutant nsclc, pdac and crc models. to our knowledge, such analysis has never been performed to begin to query the biologic basis of sex disparity in cancer or cancer therapy. we illustrate another feature of our framework by analyzing a list of hit genes nominated by a bespoke crispr drug screen for gene essentiality in brca / -wild type and brca / - mutated breast and ovarian cancer. in a third application, we analyzed the principle of synthetic lethality for genes in kras-mutant and egfr-mutant nsclc models. we then used candi to globally identify genes that are conditionally essential in the context of common cancer driver mutations. finally, we nominated potential new immunotherapy targets in kras-mutant, egfr-mutant and pan -nsclc models by using candi to identify genes that are differentially expressed in normal bronchial epithelial cells versus nsclc models that are localized at the plasma membrane. our data reveal a wealth of new hypotheses that can be rapidly generated from publicly available cancer data. by sharing data flows and use cases with a candi community we illustrate the ways in which individual research groups can interact with massive cancer genomics projects without reinventing tools or relying upon depmap tool releases. we anticipate that candi will be widely used in cell biology, immunology and cancer research. methods (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . candi the candi data integrator is available at https://github.com/yogiski/candi. candi module structure the candi data integrator is a python library built on top of the pandas that is specialized in integrating the publicly available data from the cancer dependency map (depmap release: quarter ) , the cancer cell line encyclopedia (ccle release: quarter ) , the pooled in-vitro crispr knockout essentiality screens database (pickles library: avana quarter ) , the comprehensive resource of mammalian protein complexes (corum) and protein localization data from the cell atlas , the map of the cell , and the in silico surfaceome , . data from depmap and ccle used in the following analyses are from the q release. data from pickles is from the quarter release of depmap using the avana library. access to all datasets is controlled via a python class called data. upon import the data class reads the config file established during installation and defines unique paths to each dataset and automatically loads the cell line index table and the gene index table. installation of candi, configuration, and data retrieval is handled by a manager class that is accessed indirectly through installation scripts and the data class. interactions with this data are controlled through a parent entity class and several handlers. the biologically relevant abstraction classes (gene, cellline cancer, organelle, genecluster, celllinecluster) inherit their methods from entity. entity methods are wrappers for hidden data handler classes who perform specific transformations, such as data indexing and high throughput filtering. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . differential expression in all cases where it is mentioned differential expression was evaluated using the deseq r package (release . ) . significance was considered to be an adjusted p-value of less than . . differential essentiality essentiality scores are taken from the pickles database (avana q ). to reduce the number of hypotheses posed during this analysis the mutual information of gene essentiality was calculated using the mutual information metric from the python package scikitlearn (version . . ). genes with mutual information scores greater than one standard devation above the median were removed from consideration. differential essentiality was evaluated by performing a mann-whitney u-test between two groups on every gene that passed the mutual information filter. significance was considered to be a p-value of less than . . magnitude of differential essentiality of a given gene was shown as the difference in mean bayes factors between two groups of cell lines. protein localization confidence protein localization data was assembled from the cell atlas , the map of the cell , and the in silico surfaceome , . confidence annotations were taken from the supplemental data of each paper and put on a number scale from to and summed for a total confidence score for each localization annotation for every gene where across all three papers. the analysis shown in figure represents a gene list that was further manually curated to remove the genes that are (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . localized to the intracellular space at the cell membrane revealing cell surface protein targets that are highly expressed in nsclc cancer models over normal lung bronchial epithelial cells , , , . depmap creative commons license when an individual user runs candi they are downloading depmap data and thus are agreeing to a cc attribution . license (https://creativecommons.org/licenses/by/ . /). synthetic lethality of fanconi anemia genes in ovarian and breast cancer models we made a list of the top gene hits that confer sensitivity to parp inhibition in hela cells . using candi the essentiality scores of these top hits were visualized across all ovarian cancer cell models in pickles (avana q ). fanca and fance showed selective essentiality in the brca / mutant ovarian cancer cell lines. following this observation candi was used to gather the gene essentiality for all fanc genes in the fanconi anemia pathway. candi was then used to visualize these data across all ovarian and breast cancer cell lines, sorting by brca / mutation status. synthetic lethality in kras and egfr mutant cell lines candi was leveraged to bin nsclc cell lines present in both ccle (release: q ) and pickles (avana q ) into groups. kras mutant and kras wild type cell lines with and without egfr mutants removed as well as egfr mutant and egfr wild type cell lines with and without kras mutants removed. the mean essentiality score for every gene in the genome was calculated for every group of cell lines. synthetic lethality score per gene is defined as the change in mean essentiality from the mutant groups to the wild type groups. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pan cancer synthetic lethality analysis a set of core oncogenes and tumor suppressor driver mutations was chosen for analysis . to test the effect of these gene’s mutations on gene essentiality candi was leveraged to split into two groups: a nonsense mutation group containing genes annotated as tumor suppressors (n= ) and a missense mutation group containing genes annotated as oncogenes with specific driver protein changes (n= ). candi was then used to collect a core set of genes with highly variable essentiality. to do this the bayes factors from the pickles database (avana q ) were converted to binary numeric variables. bayes factors over were assigned a =essential and bayes factors under were assigned a =non-essential. genes were then sorted buy their variance across cell lines and genes between the th and th percentile were used for this analysis (n= ). to determine a short list of genes with which to follow up on chi tests were applied to the gene pairs in the missense group and the gene pairs in the tumor suppressor group. three new groups were formed for further analysis: the first consisted of the significant gene/mutation pairs from the oncogenic group, the second consisted of the significant gene/mutation pairs from the tumor suppressor group, and the third was a combination of the significant pairs from both groups with no discrimination on the type of mutations considered. these groups were further analyzed for differential essentiality via the mann whitney method described above and the cohens d effect size were calculated to measure the extent of the phenotype. differential expression and essentiality of male and female kras driven cancers (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we used candi to gather all cell lines that are present in both pickles (avana q ) and ccle (release q ). candi was then leveraged to put these cell lines into the following tissue groups: kras mutant colon/colorectal, pdac, and nsclc. each tissue group was then split into male and female sub-groups. differential expression was analyzed by applying the methods described above to raw rna-seq counts data from ccle (release: q ). genes with adjusted p-values less than . were considered significantly differentially expressed. differential essentiality was analyzed using the methods described above on the previously described sex-subgroups for each tissue type. genes with p-values less than . were considered significantly differentially essential between male and female cell models. for each tissue type the distributions of the top significantly differentially essential genes were highlighted in comparison with the bottom as a negative control. differential expression of benign and malignant cancer cell lines we downloaded human bronchial epithelial (hbe) rna-seq data from gillen et al via the european nucleotide archive to use as a benign lung tissue model . this data set contains gene expression data for primary hbe cells cultured from three different donors and also nhbe cells (lonza cc- , a mixture of hbe and human tracheal epithelial cells). we then used candi to put nsclc models into three different groups: kras mutant, egfr mutant, and all cell lines. for our benign model raw counts were quantified via kallisto . raw counts for our malignant cell lines were queried via candi. deseq was then applied to evaluate the differential expression between our normal lung tissue model and our three malignant lung tissue groups. the results from deseq were then filtered by significance (adjusted p-value < . ). to filter based on potential immunotherapy targets we removed all genes not annotated as being (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . localized to the plasma membrane, and genes with localization confidence scores lower than six. genes that were obviously mis-annotated as surface proteins were also manually removed. supplementary figure/table legends supplementary figure . supplementary figure . an object-oriented schema diagram showing core structure of candi software. supplementary table . a table containing raw pickles bayes factors displayed in the heat map of fig. e. supplementary table . a table containing mean pickles bayes factors for each series displayed in fig. a,b. a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary table . a table containing the data for all chi tests performed to generate fig. c,d. supplementary table . a table containing the data for scatter plots shown in fig. e,f,g. supplementary table . a table containing the data from the differential essentiality analysis for all three tissues in fig. a-f. supplementary table . a table containing the data from the differential expression analysis for all three tissues in fig. a,c,e. supplementary table . a table containing the differential expression analysis data merged with the location data for all three tissues shown in fig. . acknowledgements we thank everyone in the gilbert lab for helpful comments and discussion. lag is supported by k /r ca and dp ca as well as the goldberg-benioff endowed professorship in prostate cancer translational biology. conflicts of interest none bibliography . ghandi, m. et al. next-generation characterization of the cancer cell line encyclopedia. nature , – ( ). . li, h. et al. the landscape of cancer cell line metabolism. nat. med. , – ( ). . tsherniak, a. et al. defining a cancer dependency map. cell , - .e ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . thul, p. j. et al. a subcellular map of the human proteome. science , ( ). . cancer cell line encyclopedia consortium & genomics of drug sensitivity in cancer consortium. pharmacogenomic agreement between two cancer cell line data sets. nature , – ( ). . barretina, j. et al. the cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. nature , – ( ). . bausch-fluck, d. et al. the in silico human surfaceome. pnas , e –e ( ). . giurgiu, m. et al. corum: the comprehensive resource of mammalian protein complexes- . nucleic acids res. , d –d ( ). . nusinow, d. p. et al. quantitative proteomics of the cancer cell line encyclopedia. cell , - .e ( ). . szklarczyk, d. et al. the string database in : quality-controlled protein-protein association networks, made broadly accessible. nucleic acids res. , d –d ( ). . itzhak, d. n., tyanova, s., cox, j. & borner, g. h. global, quantitative and dynamic mapping of protein subcellular localization. elife , ( ). . meyers, r. m. et al. computational correction of copy number effect improves specificity of crispr-cas essentiality screens in cancer cells. nat. genet. , – ( ). . behan, f. m. et al. prioritization of cancer therapeutic targets using crispr–cas screens. nature , – ( ). . wang, t. et al. identification and characterization of essential genes in the human genome. science , – ( ). . hart, t. et al. high-resolution crispr screens reveal fitness genes and genotype- specific cancer liabilities. cell , – ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . wang, t. et al. gene essentiality profiling reveals gene networks and synthetic lethal interactions with oncogenic ras. cell , - .e ( ). . chan, e. m. et al. wrn helicase is a synthetic lethal target in microsatellite unstable cancers. nature , – ( ). . adamson, b. et al. a multiplexed single-cell crispr screening platform enables systematic dissection of the unfolded protein response. cell , - .e ( ). . wainberg, m. et al. a genome-wide almanac of co-essential modules assigns function to uncharacterized genes. http://biorxiv.org/lookup/doi/ . / ( ) doi: . / . . lenoir, w. f., lim, t. l. & hart, t. pickles: the database of pooled in-vitro crispr knockout library essentiality screens. nucleic acids res , d –d ( ). . bausch-fluck, d. et al. a mass spectrometric-derived cell surface protein atlas. plos one , ( ). . o’connor, m. j. targeting the dna damage response in cancer. mol. cell , – ( ). . zimmermann, m. et al. crispr screens identify genomic ribonucleotides as a source of parp-trapping lesions. nature , – ( ). . pan, x. et al. fancm, brca , and blm cooperatively resolve the replication stress at the alt telomeres. pnas , e –e ( ). . lou, k., gilbert, l. a. & shokat, k. m. a bounty of new challenging targets in oncology for chemical discovery. biochemistry , – ( ). . narayan, g. et al. promoter hypermethylation of fancf: disruption of fanconi anemia- brca pathway in cervical cancer. cancer res , – ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . ideker, t., dutkowski, j. & hood, l. boosting signal-to-noise in complex biology: prior knowledge is power. cell , – ( ). . chang, m. t. et al. identifying recurrent mutations in cancer reveals widespread lineage diversity and mutational specificity. nat. biotechnol. , – ( ). . lou, k. et al. krasg c inhibition produces a driver-limited state revealing collateral dependencies. sci signal , ( ). . cancer disparities - national cancer institute. https://www.cancer.gov/about- cancer/understanding/disparities ( ). . love, m. i., huber, w. & anders, s. moderated estimation of fold change and dispersion for rna-seq data with deseq . genome biology , ( ). . rubin, j. b. et al. sex differences in cancer mechanisms. biol sex differ , ( ). . gillen, a. e. et al. molecular characterization of gene regulatory networks in primary human tracheal and bronchial epithelial cells. j. cyst. fibros. , – ( ). . mj, k. et al. prognostic significance of cd overexpression in non-small cell lung cancer. lung cancer (amsterdam, netherlands) vol. https://pubmed.ncbi.nlm.nih.gov/ / ( ). . ko, y. h. et al. prognostic significance of cd s expression in resected non-small cell lung cancer. bmc cancer , ( ). . penno, m. b. et al. expression of cd in human lung tumors. cancer res , – ( ). . bailey, m. h. et al. comprehensive characterization of cancer driver genes and mutations. cell , - .e ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . bray, n. l., pimentel, h., melsted, p. & pachter, l. near-optimal probabilistic rna-seq quantification. nat biotechnol , – ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . count sgrnas abundance by deep sequencing to measure gene/drug phenotypes t samplecrispr hela cell line lentiviral transduction of genome-scale crispr sgrna library olaparib untreated hela cell line cal cell line kpl cell line zr cell line ... cov cell line jhos cell line tov g cell line ... breast cancer cervical cancer ovarian cancer ca b d e candi integration cancer data integrator essentiality mutation ... candi cellular genomics functional genomics transcriptomics proteomics vs. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . − − differential essentiality (Δ average bf) − . − . − . − . . . . . . ppp r b cflar nxt ctnnb slc a mansc ahcyl arhgef l mrpl efcab c ol on non-sigfnificant differentially expressed differentially essential shown in violin plots pp p r b cf la r nx t ct nn b sl c a ma ns c ah cy l ar hg ef l mr pl ef ca b gene − − − b ay es f ac to r top hit female top hit male − − − differential essentiality (Δ average bf) − . − . − . − . . . . . d iff er en ti al e xp re ss io n ( lo g (f c )) bcl l gpi eno rtcb pkm wac pcid arhgap slc a gpr bc l l gp i en o rt cb pk m w ac pc id ar hg ap sl c a gp r gene − − b ay es f ac to r − − − differential essentiality (Δ average bf) − − chmp chmp haus wls katnb id acsl kcne rufy krt pa nc re as ch mp ch mp ha us w ls ka tn b id ac sl kc ne ru fy kr t gene − − b ay es f ac to r lu ng negative control female negative control male essential gene thresholdm or e es se nt ia l le ss e ss en tia l m or e es se nt ia l le ss e ss en tia l m or e es se nt ia l le ss e ss en tia l female cell linesmale cell lines more essential in more essential in male cell lines more essential in female cell lines more essential in male cell lines more essential in female cell lines more essential in u p re gu la te d in u p re gu la te d in d iff er en ti al e xp re ss io n ( lo g (f c )) u p re gu la te d in m al e c el l l in es u p re gu la te d in fe m al e c el l l in es d iff er en ti al e xp re ss io n ( lo g (f c )) u p re gu la te d in u p re gu la te d in m al e c el l l in es fe m al e c el l l in es m al e c el l l in es fe m al e c el l l in es a b c d e f (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . log (fold change) -l og (q v al ue ) cd slc a b m itga slc a hla-c cd lrpap ddr vdac slc a slco a kras mutant cd slc a b m itga slc a hla-c cd lrpap ddr vdac slc a slco a gene lo g ( tp m + ) kras mutant cell line type benign bronchial malignant log (fold change) -l og (q v al ue ) b m slc a cd itga atp a slc a cd ddr hla-clrpap itga tfpi egfr mutant b m slc a cd itga atp a slc a cd ddr hla-c lrpap itga tfpi gene lo g ( tp m + ) egfr mutant log (fold change) -l og (q v al ue ) b m cd thy slc a slc a lrpap hla-c ddr slc a itga ptgfrn vdac all lung cancer b m cd thy slc a slc a lrpap hla-c ddr slc a itga ptgfrn vdac gene lo g ( tp m + ) all lung cancer location confidence a b c d e f (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . gene essentiality in kras mt cell lines (average bf) g en e es se nt ia lit y in k r as w t c el l l in es ( av er ag e bf ) kras egfr kras egfr more essentialless essential m ore essential less essential essential gene threshold egfr mt included egfr mt removed gene essentiality in egfr mt cell lines (average bf) g en e es se nt ia lit y in e g fr w t c el l l in es ( av er ag e bf ) kras egfr kras egfr more essentialless essential m ore essential less essential essential gene threshold kras mt included kras mt removed a b c es se nt ia lit y nonsense tumor supressor genes context speci�c effect size . braf/braf nras/nras kras/kras hras/hras effect size effect size kras/kras nras/nras braf/braf hras/hras nras/kras non-hit signi�cant hit essentiality/mutation missense all mutations nonsense e f g more essential less essential . . . p-value d missense oncogenes tumor supressor genes context speci�c mutations (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . rdrugtrajectory: an r package for the analysis of drug prescriptions in electronic health care records jss journal of statistical software mmmmmm yyyy, volume vv, issue ii. reddoi: . /jss.v .i rdrugtrajectory: an r package for the analysis of drug prescriptions in electronic health care records anthony nash university of oxford tingyee e. chang university of oxford benjamin wan kings college london m. zameel cader university of oxford abstract primary care electronic health care records are rich with patient and clinical infor- mation. studying electronic health care records has resulted in marked improvements to national health care processes and patient-care decision making, and is a powerful supple- mentary source of data for drug discovery effort. we present the r package rdrugtrajec- tory, designed to yield demographic and patient-level characteristics of drug prescriptions in the uk clinical practice research datalink dataset. the package operates over clin- ical practice research datalink gold clinical, referral and therapy datasets and includes features such as first drug prescriptions analysis, cohort-wide prescription information, cu- mulative drug prescription events, the longitudinal trajectory of drug prescriptions, and a survival analysis timeline builder to identify risks related to drug prescription switching. the rdrugtrajectory package has been made freely available via the github repository. keywords: ehr, electronic health care records, cprd, clinical practice research datalink, prescriptions, r, therapeutics, drug discovery, clinical epidemiology. . introduction the uk clinical practice research datalink (cprd) service offers high quality longitudinal data on million patients with up to years of follow-up for % of those patients. the service provides drug treatment patterns, feasibility studies and health care resource use stud- ies. patient electronic health care records (ehr) are stored as coded and anonymised data and sourced from over , primary care practices across england. cprd holds informa- tion on consultation events, medical diagnoses, symptoms, prescriptions, vaccination history, laboratory tests, and referrals. cprd can provide routine linkage to other health-related patient datasets, for example: small area level data, such as patient and/or practice postcode .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://dx.doi.org/ . /jss.v .i https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records linked deprivation measures; data from nhs digital which includes hospital episode statistic, outpatient and accident and emergency data; and cancer data from public health england. evidence from ehrs is making an impact on primary care decision-making and best prac- tice oyinlola et al. ( ). with nationwide longitudinal datasets more readily available, the evaluation of treatments over long timescales can contribute to clinical decision-making hepp et al. ( ). for example, adverse events caused by prescription medication can be studied using retrospective data in situations where randomized clinical trials may prove impracti- cal ghosh et al. ( ); bally et al. ( ). this publication serves as an introduction to the rdrugtrajectory r package and whilst this publication is by no means a complete tutorial, we will expand on some of the main pack- age features, such as, how to: isolate patients by first drug prescriptions at given clinical events; calculate time-invariant prescriptions; construct survival analysis timelines (compati- ble with cox proportional hazard regression and kaplan meier curves), and; visualise patient prescription switching. for a comprehensive list of functions please visit the github reposi- tory https://github.com/acnash/rdrugtrajectory. almost all features can be controlled by covariates or stratified by some variable, for example, by gender, age, medical codes or treatment product codes. the example code, figures and data structures presented here mimic a small fraction of our own research. in the interest of patient confidentiality, the clinical data used in the analysis have been fabricated. we present a brief tour of some of the functions available, starting with a discussion on the cprd data structure and how records must be formatted. a glossary of terms has been provided (table ) to assist the reader. . rdrugtrajectory package and data structures . . rdrugtrajectory availability and installation rdrugtrajectory is free to download from the github repository https://github.com/acnash/ rdrugtrajectory and holds an mit license. fabricated cprd clinical and cprd prescrip- tion records in addition to age, gender and index of multiple deprivation scores are included for test and tutorial purposes. before installing the package, the following r dependencies are required: plyr, dplyr, foreach, doparallel, data.table, parallel, splus r, rlist, reda, ggplot , ggalluvial, stats, utils and useful. the latest rdrugtrajectory binary is install using: install.packages("path/to/tar/file", source = true, repos=null) rdrugtrajectory was developed and tested on r version . . . please consult the github page for release notes, the latest version and up to date installation instructions. . . cprd product descirption several rdrugtrajectory functions use the cprd product.txt file for assigning a text descrip- tion to a prescription prodcode. the product.txt (and medical.txt for medcode description) is available in the cprd data dictionary windows software. it is important that the file .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://github.com/acnash/rdrugtrajectory https://github.com/acnash/rdrugtrajectory https://github.com/acnash/rdrugtrajectory https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software term description rdrugtrajectory an r packaged designed for the management of cprd prescription data. clinical the clinicalnnn.txt dataset presented in a rdrugtrajectory dataframe. referral the referralnnn.txt dataset presented in a rdrugtrajectory dataframe. therapy the therapynnn.txt dataset presented in a rdrugtrajectory dataframe. additionalnnn.txt the cprd dataset of additional clinical information, for example, patient smoking status and alcohol comsumption. data can be retrieved using cprdlookups.r. modecode a cprd identifier that denotes medical conditions, diagnosis and com- plaints made by a patient. medcodes are recorded in the clinicalnnn.txt and referralnnn.txt files. prodcode a cprd identifier that denotes treatment products, including drugs, foods, and medical apparatus. prodcodes are recorded in the thera- pynnn.txt files. patid a unique cprd patient identifier. used to link datasets. event any procode or medcode in a patient’s ehr. eventdate the date of an event recorded by a general practitioner. present in all three datasets and corresponding rdrugtrajectory dataframe. imd index of multiple deprivation score - a uk government socioeconomic measurement based on postcode of the clinic or a patient’s registered ad- dress. prescription a general time for any prodcode prescribed for treatment. medical history indicates a combination of one or more sets of cprd data, for example, the collection of all clinical and therapy ehr for patients with a medcode for migraine. product.txt a plain text file that contains all prodcodes with a description and comes bundled with the cprd data dictionary. the file is used to link a prodcode with a description. table : table of frequently used terms. remains in plain text, with columns tab-delimited. the files can be simplified by removing all non-essential products. finally, all the eleven columns that make up the product.txt file must be available, with the first column containing all prodcodes and the fourth column containing the product description. a simplified product.txt file, presented below, can be downloaded from the github page. > library(rdrugtrajectory) > productdf <- read.csv("../rdrugtrajectory_data/product.txt", + sep="\t", + header=false) > head(productdf) v v v v v atenolol mg tablets atenolol atenolol mg tablets atenolol atenolol mg tablets atenolol .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records amitriptyline mg tablets amitriptyline hydrochloride lisinopril mg tablets lisinopril lisinopril mg tablets lisinopril v v v v mg tablet oral mg tablet oral mg tablet oral mg tablet oral / / mg tablet oral mg tablet oral v beta-adrenoceptor blocking drugs beta-adrenoceptor blocking drugs beta-adrenoceptor blocking drugs tricyclic and related antidepressant drugs/neuropathic pain/prophylaxis of migraine angiotensin-converting enzyme inhibitors angiotensin-converting enzyme inhibitors v v feb- feb- feb- feb- feb- feb- . . rdrugtrajectory package structure rdrugtrajectory contains three r files: ( ) all functions related to data curating and search- ing reside within prddrugtrajectory.r; ( ) analysis tools and timeline construction reside within cprddrugtrajectorystats.r; and, ( ) all utilities including input/output operations reside within cprddrugtrajectoryutils.r. the packages contains several fabricated cprd datasets: testclinicaldf, testtherapydf, agegenderdf, imddf, and druglistdf. a de- scription of each, along with information on data types and structures are given below. . . the cprd ehr data structure the structure of cprd gold data may depend on whether the cprd license holder per- forms intermediate data management steps before releasing data to the user. however, typ- ically, cprd gold data follows the cprd gold specification https://cprdcw.cprd.com/ _docs/cprd_gold_full_data_specification_v . .pdf. currently, rdrugtrajectory sup- ports ehr data from the flat files clinicalnnn.txt, referralnnn.txt, and therapynnn.txt. the additional clinical details files (additionalnnn.txt) are currently supported using our re- leased r script cprdlookups.r https://github.com/acnash/cprd_additional_clinical ?. patients are assigned a unique numerical patid value. the operations performed by rdrugtra- jectory requires the patid to identify patients and subset patient groups. we recommend that patid, medcode, prodcode are kept as character data throughout any preliminary data curating .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://cprdcw.cprd.com/_docs/cprd_gold_full_data_specification_v . .pdf https://cprdcw.cprd.com/_docs/cprd_gold_full_data_specification_v . .pdf https://github.com/acnash/cprd_additional_clinical https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software steps. medical events are recorded as codes and stored in the clinicalnnn.txt and refer- ralnnn.txt under the column header medcode. prescription events, such as drug prescriptions are also recorded as codes and stored in the therapynnn.txt file under the column header prodcode and the sequences of repeat prescriptions are under the issueseq column header. dates associated medical and prescription events, recorded by the general practitioner, are stored under the column header eventdate. . . essential data types and data structures rdrugtrajectory can operate over cprd gold ehr clinical, referral and prescription data provided each dataset format is presented as separate r dataframes or combined into a rdrug- trajectory medical history dataframe. the construction of clinical, referral and prescription dataframes require, as a minimum, a patid and eventdate column, and either medcode or prod- code (for therapy data, issueseq is necessary), and presented in that order. every record of medcode or prodcode must be accompanied by an eventdate entry (encoded as a date class of the form yyyy-mm-dd). patients can have duplicate events within the same data set and between data sets. medical and prescription codes can be retrieved from the corresponding medical.txt and product.txt files which come bundled with the cprd data dictionary win- dows application. rdrugtrajectory comes packaged with fabricated ehr data in the structure of: > library(rdrugtrajectory) > #fabricated clinical data (referral data follows the same format) > names(testclinicaldf) [ ] "patid" "eventdate" "medcode" "consid" > #fabricated prescription data > names(testtherapydf) [ ] "patid" "eventdate" "prodcode" "consid" "issueseq" users can check if the structure of an ehr dataframe meets the requirements for this package by calling checkcprdrecord; additional columns such as consultation identification number (consid) are not considered. in the following instance, a prescription dataset with the required columns and the optional consultation identification number is presented. > library(rdrugtrajectory) > #check the structure of testtherapy, specify that it is therapy data > checkcprdrecord(df=testtherapydf, datatype="therapy") [ ] "the data.frame is appropriately formatted. returning true." [ ] true > #display the rdrugtrajectory ehr therapy dataframe > str(testtherapydf, strict.width="wrap") .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records 'data.frame': obs. of variables: $ patid : int ... $ eventdate: date, format: " - - " " - - " ... $ prodcode : int ... $ consid : int ... $ issueseq : int ... users can combine with the rdrugtrajectory ehr dataframes any number of patient and ehr data to act as covariates and stratifying variables, typically this can be done using the r cbind operation. for example, bmi and smoking status, both of which can be retrieved from the additionalnnn.txt dataset files using cprdlookups.r, can be linked by searching for and binding with the record patid values. the rdrugtrajectory package contains several utility functions to retrieve cprd data, including, patient year of birth, gender (male or female) and either patient-level or clinical-level index of multiple deprivation score (imd). the patient age can be determined by adding to the value in yob column in the patient cprd ehr dataset and then subtracting that value (birth year) from the year of the cprd database release. this data requires preliminary treatment before presenting to the rdrugtrajectory package. patient age, gender and imd score must be presented in a dataframe with the linked patient column patid, along with the columns age, gender, and score. providing the patid column is preserved, patient characteristics can be presented in separate dataframe, for example: > library(rdrugtrajectory) > #patient age and gender as one dataframe > str(agegenderdf, strict.width="wrap") 'data.frame': obs. of variables: $ patid : int ... $ yob : num ... $ gender: int ... > #clinic-level imd score as one datafrmae > str(imddf, strict.width="wrap") 'data.frame': obs. of variables: $ patid : int ... $ pracid: int ... $ score : int ... the patid patient identifier is fundamental in every operation performed by rdrugtrajectory. the examples presented here and those in the reference manual rely on searching and subset- ting ehr data using a list or vector of patient identifier. the function getuniquepatidlist will retrieve an r list of patient identification numbers from any dataframe with a patid column. the aforementioned rdrugtrajectory ehr dataframes, clinical, referral and therapy, can be combined into a single dataframe. we refer to this dataset instance as the patient’s medical .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software history and can be constructed using constructmedicalhistory. this dataframe expects events to be in chronological order, and will introduce a new column, code and codetype to denote each of the combined events. the code (medcode and/or prodcode) can be distinguished by a codetype value of c (clinical events), r (referral events), and t (prescription events). events are returned in chronological order using the eventdate data. the following code demonstrates how to retrieve a list of patient identifier from a prescription dataframe and from a medical history dataframe, followed by how to subset using base r operations and, finally, the medical history dataframe structure. > library(rdrugtrajectory) > #retrieve patids from therapy data. > idlist <- getuniquepatidlist(testclinicaldf) > medhistorydf <- constructmedicalhistory(testclinicaldf, null, testtherapydf) [ ] "using clinical data." [ ] "using therapy data." [ ] "building with clinical and therapy data." > #retrieve patid from medical history. > medhistoryidlist <- getuniquepatidlist(medhistorydf) > numofpatients <- length(medhistoryidlist) > #subset using the first patients. > smallmedhistorydf <- subset(medhistorydf, + medhistorydf$patid %in% medhistoryidlist[ : ]) > #separate out the first patient with a clinical record. > smallclinicalonlydf <- subset(smallmedhistorydf, + smallmedhistorydf$codetype == "c") > #separate out the first patient with a therapy record. > smalltherapyonlydf <- subset(smallmedhistorydf, + smallmedhistorydf$codetype == "t") > #subset only or those patient records beyond st jan . > latermedhistorydf <- subset(medhistorydf, + medhistorydf$eventdate > as.date(" - - ")) > #medical history dataframe structure > str(medhistorydf, strict.width="wrap") 'data.frame': obs. of variables: $ patid : int ... $ eventdate: date, format: " - - " " - - " ... $ code : int ... $ codetype : chr "c" "c" "c" "t" ... the patid data can also be used to retrieve patient characteristics, for example, the gender of the patient using getgenderofpatients: .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records > library(rdrugtrajectory) > idlist <- getuniquepatidlist(testtherapydf) > #only use half of the cohort. > idlist <- idlist[ :(length(idlist)/ )] > #get gender data by specific gender. > malecode <- > femalecode <- > malepatientsdf <- getgenderofpatients(idlist, agegenderdf, malecode) > femalepatientsdf <- getgenderofpatients(idlist, agegenderdf, femalecode) > #get all gender data > allpatientsdf <- getgenderofpatients(getuniquepatidlist(testtherapydf), + agegenderdf) > #structure of the patient gender data. > str(allpatientsdf, strict.width="wrap") 'data.frame': obs. of variables: $ patid : int ... $ gender: int ... imd data can be retrieved by combining getuniquepatidlist and getimdofpatients func- tions: > library(rdrugtrajectory) > idlist <- getuniquepatidlist(testtherapydf) > #get patients with an imd score of or > onepatientsdf <- getimdofpatients(idlist, imddf, ) > twopatientsdf <- getimdofpatients(idlist, imddf, ) > #get all imd scores for all patients in testtherapydf > allpatientsdf <- getimdofpatients(getuniquepatidlist(testtherapydf), imddf) > #structure of the patient gender data. > str(allpatientsdf, strict.width="wrap") 'data.frame': obs. of variables: $ patid: int ... $ score: int ... the final example of ehr dataframe manipulation presented here demonstrates how to re- trieve all prescription records for patients prescribed a specific prescription treatment. for example, such an operation can be used to retrieve all prescription records for any patient prescribed amitriptyline. in addition, it is also possible to return only prescription records matching specific prescription treatments. importantly, prescription prodcodes can be grouped into lists and used to collect those patients with at least one record that matches an element of that list. this approach is useful if the dose is not relevant to the study or the prescription is dispensed under multiple product names. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software > library(rdrugtrajectory) > #it is easy to retrieve a list of all unique prodcodes in the cohort. > prodcodesvector <- unique(testtherapydf$prodcode) > reducedprodcodesvector <- prodcodesvector[ : ] > #all records are maintained for those patients with a matching prodcode. > therapyofinterestdf <- getpatientswithprodcode(testtherapydf, + reducedprodcodesvector) > #only those records that match are retained. > reducedtherapyofinterestdf <- getpatientswithprodcode(testtherapydf, + reducedprodcodesvector, + removeexcessdrugs=true) . ehr drug prescription results and discussion having briefly demonstrated some basic operation on retrieving patient records by matching ehr dataframes against sets of patid values, we move on to showcase several operations available to the user. we begin by presenting examples of cohort prescription summary statistics followed by methods of dataset curating and stratifying by patient groups. we then present examples on how to search for patients prescribed with a first-line treatments, followed by presenting some of these patient groups as sequences of prescriptions. finally, we demonstrate several examples of building time-lines. for futher examples, please see the github page and reference manual. . . cohort summmary statistics geteventdatesummarybypatient rdrugtrajectory can return summary based statistics on patient and cohort level prescription data with geteventdatesummarybypatient and getpopulationdrugsummary, respectively. for example, a single patient (via getuniquepatidlist and [] dataframe subsetting) pre- scription history returns the patient patid, number of prescription events, median number of days between events, fewest number of days between events, the most number of days between events (maxtime and longestduration are the same), and record duration (number of days between the first and last prescription event on record): > library(rdrugtrajectory) > idlist <- getuniquepatidlist(testtherapydf) > resultlist <- geteventdatesummarybypatient( + testtherapydf[testtherapydf$patid==idlist[[ ]],]) > str(resultlist, strict.width="wrap") list of $ timeserieslist: num [ : ] $ summarydf :'data.frame': obs. of variables: ..$ patid : int ..$ numberofevents : int .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records ..$ mediantime : num ..$ mintime : num ..$ maxtime : num ..$ longestduration: num ..$ recordduration : int - attr(*, "class")= chr "eventdatesummaryobj" getpopulationdrugsummary this approach can be extended across the cohort of patients with getpopulationdrugsummary. the returning populationeventdatesummary s object is a list of three elements. the first element is the summarydf dataframe derived from calling geteventdatesummarybypatient per patient, with the set of statistics retrievable through the accompanied patid. the second element is the timeserieslist, which holds a vector per patient of the number of days between consecutive prescription events. vectors can be accessed using the patid element name: > library(rdrugtrajectory) > resultlist <- getpopulationdrugsummary(df = testtherapydf, + prodcodesvector = null) > str(resultlist, strict.width="wrap", list.len = ) list of $ summarydf :'data.frame': obs. of variables: ..$ patid : int [ : ] ... ..$ numberofevents : int [ : ] ... ..$ mediantime : num [ : ] . ... ..$ mintime : num [ : ] ... ..$ maxtime : num [ : ] ... .. [list output truncated] $ timeserieslist:list of ..$ : num [ : ] ..$ : num [ : ] ... ..$ : num ..$ : num ..$ : num [ : ] ... .. [list output truncated] - attr(*, "class")= chr "populationeventdatesummary" > #get all patids for patients younger than . > ageidlist <- getuniquepatidlist(agegenderdf[agegenderdf$yob < ,]) > timeserieslist <- resultlist[[ ]] > #get all patids of available data. > recordpatids <- names(timeserieslist) > #get time data for the intersect of those patids of patients < and the patids > #of available data. > subtimelist <- timeserieslist[intersect(ageidlist, recordpatids)] > str(subtimelist, strict.width="wrap", list.len = ) .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software list of $ : num $ : num $ : num $ : num $ : num [list output truncated] . . curating drug prescription records there is no direct link between a prescription event and a medcode in the cprd data. the relationship between the two can be inferred from the event dates of the prescription and clinical events, in addition, to information provided by the consultation id and the prescription issue number. matchdrugwithdisease rdrugtrajectory provides several methods for curating prescription datasets with the aim of es- tablishing a relationship between prescription and clinical events. the matchdrugwithdisease function returns a subset of all prescription events with an established relationship between therapy and clinical event. to what degree these patients are included in the search is con- trolled with a function argument. there are three scenarios: all patients with a record of a specific prescription event and specific clinical event, at any point; all patients with a record of a specific prescription event on the same date as a specific clinical event; and, all patients with a record of a specific prescription event on the same date as a specific clinical event and clear from additional clinical events on that day. one would expect fewer patients as the stringency of the search criteria is increased: > library(rdrugtrajectory) > prodcodes <- unique(testtherapydf$prodcode) > amitriptylinecodes <- prodcodes[ : ] > propranololcodes <- prodcodes[ : ] > medcodelist <- unique(testclinicaldf$medcode) > headachecodes <- medcodelist[ : ] > amitriptylineresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = ) > amitriptylineresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = ) > amitriptylineresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records + drugcodelist = amitriptylinecodes, + severity = ) > propranololresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = propranololcodes, + severity = ) > propranololresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = propranololcodes, + severity = ) > propranololresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = propranololcodes, + severity = ) getgenderofpatients the example presented, demonstrates how to identify patients prescribed amitriptyline and patients prescribed propranolol (there is patient overlap, easily controlled for by subsetting) whilst controlling for clinical overlap with or without consideration for off topic clinical events. with the identified patients, we can, for example, stratify by gender: > library(rdrugtrajectory) > library(ggplot ) > ami gender <- getgenderofpatients(amitriptylineresult , agegenderdf) > ami gender <- getgenderofpatients(amitriptylineresult , agegenderdf) > ami gender <- getgenderofpatients(amitriptylineresult , agegenderdf) > prop gender <- getgenderofpatients(propranololresult , agegenderdf) > prop gender <- getgenderofpatients(propranololresult , agegenderdf) > prop gender <- getgenderofpatients(propranololresult , agegenderdf) > amidf <- data.frame(freq=c(nrow(ami gender[ami gender$gender== , ]), + nrow(ami gender[ami gender$gender== , ]), + nrow(ami gender[ami gender$gender== , ]), + nrow(ami gender[ami gender$gender== , ]), + nrow(ami gender[ami gender$gender== , ]), + nrow(ami gender[ami gender$gender== , ]) + ), + search=c("prescribed","with headache","no comorbidities", + "prescribed","with headache","no comorbidities"), + drug="amitriptyline", + gender=c("male","male","male", + "female","female","female") + ) .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software > propdf <- data.frame(freq=c(nrow(prop gender[prop gender$gender== , ]), + nrow(prop gender[prop gender$gender== , ]), + nrow(prop gender[prop gender$gender== , ]), + nrow(prop gender[prop gender$gender== , ]), + nrow(prop gender[prop gender$gender== , ]), + nrow(prop gender[prop gender$gender== , ]) + ), + search=c("at any time","with clinical","clinical & no comorbidities", + "at any time","with clinical","clinical & no comorbidities"), + drug="propranolol", + gender=c("male","male","male", + "female","female","female") + ) > drugprescriptiondf <- rbind(amidf, propdf) > ggprescriptionami <- ggplot(drugprescriptiondf[ + drugprescriptiondf$drug=="amitriptyline",], + aes(x=search, y=freq, fill=gender)) + + geom_bar(stat="identity", position=position_dodge()) + + theme_bw() + xlab("search critera (severity)") + ylab("patient count") + + theme(axis.text.x = element_text(angle= ,hjust= )) + + ggtitle("amitriptyline") > ggprescriptionprop <- ggplot(drugprescriptiondf[ + drugprescriptiondf$drug=="propranolol",], + aes(x=search, y=freq, fill=gender)) + + geom_bar(stat="identity", position=position_dodge()) + + theme_bw() + xlab("search critera (severity)") + ylab("patient count") + + theme(axis.text.x = element_text(angle= ,hjust= )) + + ggtitle("propranolol") > filtering through prescription events can also be controlled by a date range. for example, if one was calculating the number of patients prescribed amitriptyline per year from to and matched to a headache event, one can apply a date range: > library(rdrugtrajectory) > library(ggplot ) > prodcodes <- unique(testtherapydf$prodcode) > amitriptylinecodes <- prodcodes[ : ] > #clinical event of interest are headaches. > medcodelist <- unique(testclinicaldf$medcode) > #medcodes can be refined further. > headachecodes <- medcodelist[ : ] > #dataframes defined for binned dates are constructed by providing all the > #patients to consider and the binned start and stop date. > date df <- data.frame(patid=unlist(getuniquepatidlist(testtherapydf)), + start=as.date(as.character(" - - ")), + stop=as.date(as.character(" - - "))) .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records no c om or bi di tie s pr es cr ib ed w ith h ea da ch e search critera (severity) p a tie n t co u n t gender female male amitriptylinea at a ny ti m e cl in ica l & n o co m or bi di tie s w ith c lin ica l search critera (severity) p a tie n t co u n t gender female male propranololb figure : the number of patients prescribed (a) amitriptyline or (b) propranolol. the criteria to match against clinical data is indicated: at any time, with a clinical record, and with a clinical record clear off topic clinical events. > date df <- data.frame(patid=unlist(getuniquepatidlist(testtherapydf)), + start=as.date(as.character(" - - ")), + stop=as.date(as.character(" - - "))) > date df <- data.frame(patid=unlist(getuniquepatidlist(testtherapydf)), + start=as.date(as.character(" - - ")), + stop=as.date(as.character(" - - "))) > date df <- data.frame(patid=unlist(getuniquepatidlist(testtherapydf)), + start=as.date(as.character(" - - ")), + stop=as.date(as.character(" - - "))) > date df <- data.frame(patid=unlist(getuniquepatidlist(testtherapydf)), + start=as.date(as.character(" - - ")), + stop=as.date(as.character(" - - "))) > #retrieve prescription frequencies per binned range > amitresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = , + datedf = date df) > amitresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = , + datedf = date df) > amitresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = , + datedf = date df) > amitresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = , + datedf = date df) > amitresult <- matchdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodelist = headachecodes, + drugcodelist = amitriptylinecodes, + severity = , + datedf = date df) > #the number of patids returned by matchdrugwithdisease is equal to the number > #of patients with a drug - disease match per year > datadf <- data.frame(year=c(" "," "," "," "," "), + count=c(length(amitresult ),length(amitresult ), + length(amitresult ),length(amitresult ), + length(amitresult ))) > ggprescriptionyear <- ggplot(datadf, aes(x=year, y=count)) + + geom_bar(stat = "identity") + theme_bw() getpatientswithfirstdrugwithdisease unlike matchdrugwithdisease which retrieves patients with a prescription event matching clinical criteria at any time within a cprd ehr record, getpatientswithfirstdrugwithdisease identifies patients with a first prescription event that matches a desired clinical event. please note, care must be taken when searching for medication with off-label uses. for example, beta-blockers are frequently prescribed to treat hypertension and arrhythmia, however, the beta-blocker propranolol is also prescribed to treat migraine. without in depth analysis into the patient history, patients propranolol with records for hypertension or arrhythmia in addi- tion to migraine on a matching eventdate with the first propranolol prescription, could result .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records year c o u n t figure : the number of patients prescribed amitriptyline from the start of the year to the end of , stratified in year intervals. in a misleading disease-drug association. in cases where a health care professional suggests a change in the patient’s lifestyle choices, that patient may have several clinical events free from prescriptions before the first prescription of interest is prescribed. using basic subsetting one can calculate the number of clinical events before the patient’s first prescription intervention (figure a). further more, we can stratify patients into subgroups (figure b): > library(rdrugtrajectory) > library(ggplot ) > #a vector of prescriptions of interest. > druglist <- unique(testtherapydf$prodcode) > sampledrugs <- druglist[ : ] > #a vector of clinical events to match prescriptions against. > medcodes <- unique(testclinicaldf$medcode) > samplemedcodes <- medcodes[ : ] > #returns the subset of the first prescription event prescribed on the same > #eventdate as those clinical events of interest .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software > firstdf <- getpatientswithfirstdrugwithdisease(clinicaldf = testclinicaldf, + therapydf = testtherapydf, + medcodesvector = samplemedcodes, + drugcodesvector = sampledrugs) > #ensure the only clinical data are for those with an assume first-drug-disease > firstclinicaldf <- subset(testclinicaldf, + testclinicaldf$patid %in% getuniquepatidlist(firstdf)) > #only keep the diseases of interest > firstclinicaldf <- subset(firstclinicaldf, + firstclinicaldf$medcode %in% samplemedcodes) > #only keep the prescriptions of interest > firstdf <- subset(firstdf, firstdf$prodcode %in% sampledrugs) > idlist <- getuniquepatidlist(firstclinicaldf) > beforeresultdf <- data.frame(patid=unlist(idlist), freq= ) > for(id in idlist) { + #retrieve the clinical/therapy data for each patients, one by one. + indclinicaldf <- subset(firstclinicaldf, firstclinicaldf$patid == id) + indtherapydf <- subset(firstdf, firstdf$patid == id) + #get the first event date on record; this will match a clinical date. + firsteventdate <- indtherapydf$eventdate[ ] + clinicalbeforetherapydf <- subset(indclinicaldf, + indclinicaldf$eventdate < firsteventdate) + #number of clinical complaints before first prescription. + ncomplaints <- nrow(clinicalbeforetherapydf) + beforeresultdf[beforeresultdf$patid==id,]$freq <- ncomplaints + } > ggbefore <- ggplot(beforeresultdf, aes(x=freq)) + + geom_histogram(binwidth= , color="black", fill="white") + + ylab("patients") + xlab("clinical events before prescription") + + theme_bw() > #note: not every patient will have a clinical imd score. > imdidsdf <- getimdofpatients(idlist = idlist, + imddf = imddf) > #only work with those with an imd score. > imdresultsdf <- subset(beforeresultdf, + beforeresultdf$patid %in% getuniquepatidlist(imdidsdf)) > imdresultsdf <- imdresultsdf[order(imdresultsdf$patid),] > imdidsdf <- imdidsdf[order(imdidsdf$patid),] > imdresultsdf <- cbind(imdresultsdf, imd_score=as.factor(imdidsdf$score)) > ggbeforeimd <- ggplot(imdresultsdf, + aes(x=freq, fill=imd_score)) + + geom_histogram(binwidth= ) + theme_bw() + + ylab("patients") + xlab("clinical events before prescription") getmultiprescriptionsamedaypatients the function getmultiprescriptionsamedaypatients returns all prescription events for .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records clinical events before prescription p a tie n ts a clinical events before prescription p a tie n ts imd_score b figure : the number of clinical events before the first treatment across the whole cohort (a), and by imd score (b). those patients prescribed more than two prescriptions on the same date. all events of those pa- tients without a prescription prodcode event can be removed. combining getmultipleprescriptionsamedaypatients with getpatientswithfirstdrugwithdisease or matchdrugwithdisease is useful for filter- ing patients for specific prescription patterns. for example, to retrieve all patient prescription records if specific prescriptions are (a) never recorded together on the same date and (b) are used as a first line treatment for a given complaint: > library(rdrugtrajectory) > prodcodesvector = unique(testtherapydf$prodcode)[ : ] > #ensure only patients with specific prescriptions are returned providing a > #patient is prescribed those drugs on different dates, never on the same date. > uniquetherapydf <- getmultiprescriptionsamedaypatients(df = testtherapydf, + prodcodesvector = prodcodesvector, + removepatientswithoutdrugs = true) > #ensure that the patients (patid) in the therapy and clinical dataframes > #are the same. subsetting might not be enough. > reducedclinicaldf <- subset(testclinicaldf, .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software + testclinicaldf$patid %in% getuniquepatidlist(uniquetherapydf)) > #specific medcodes have not been provided. all medcodes in the clinical > #dataframe are considered. this is possible if one either one is not interested > #in the nature of the clinical complaint or the clinical dataframe has been > #adjusted to only include clinical complaints of interest. > firstdf <- getpatientswithfirstdrugwithdisease(clinicaldf = reducedclinicaldf, + therapydf = uniquetherapydf, + drugcodesvector = sampledrugs) in the above example, patients with more than one prescription on the same date or without a prescription at all (from the set of desired prescription prodcodes) were removed from the cohort. this reduced the number of patients from patients to . next, only those patients with a first line treatment (first prescription event on the same date as a clinical event) were kept, reducing the sample size to patients. removepatientsbyduration longitudinal ehr cohort studies often requires careful time-related consideration. currently, rdrugtrajectory presents two functions that identify prescription records of patients that match two time constraints. the first, removepatientsbyduration, removes all patients with prescription events that are no more than n years between consecutive events or removes patients if the duration between the first and last prescription event on record is less than n years. > library(rdrugtrajectory) > df <- removepatientsbyduration(minobsyr = , + minbreakyr = , + therapydf = testtherapydf) getburninpatients the second time-related function, getburninpatients identifies all patient prescription records with at least n days free from prescription events before a specific prescription event. this is useful if one requires a period of time free from prescription intervention before a given prescription event: > library(rdrugtrajectory) > drugofinterestvector <- c( , , , , , ) > patientlist <- getburninpatients(df = testtherapydf, + startcodesvector = drugofinterestvector, + perioddaysbefore = ) > burnintherapydf <- subset(testtherapydf, + testtherapydf$patid %in% patientlist) in the above example, from a cohort of patients, patients had a period of up to days free from of prescription events before the first prescription prodcode specified via the startcodesvector argument. the functionality relies on the patient having prescription events before the burn-in period (required to define whether the patient had a cprd record early .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records enough before the burn-in period began). for example, this patient had over three years of prescription events before the prescription of interest (from - - to - - with over days free from exposure before the prescription event of interest prodcode : > head(burnintherapydf[burnintherapydf$patid == ,], n= ) [ ] patid eventdate prodcode consid issueseq < rows> (or -length row.names) . . first drug prescriptions getfirstdrugprescription a patient’s first prescription event on cprd record can be identified by supplying getfirstdrugprescription with a list of prescription prodcodes. the functions returns firstdrugobject, an r s ob- ject of type list. only the first prescription event to match anyone one of the prescription prodcodes provided is identified. the first element of firstdrugobject contains a named list of patid vectors. each vector contains the patids of all those patients that share the same first prescription prodcode. the list element is named after the corresponding prescription prodcode. the second element in firstdrugoject, like the first, is a list of date vectors, each named after the corresponding prescription prodcode. each date vector contains the eventdate of the prescription event for the patient identified by the patid in the identical position of the preceding list. the third list element contains a table of prescription frequencies for each first prescription prodcode on record. the prodcode is accompanied by a product description providing a file of cprd prescription products has been provided. below we demonstrate how to retrieve information on first-line treatment: > library(rdrugtrajectory) > library(ggplot ) > #an adjusted data dictionary file. > filelocation <- "product.txt" > #without supplying a vector of product files all prodcodes in the therapy > #dataset are considered. > resultfdo <- getfirstdrugprescription(df = testtherapydf, + idlist = null, + prodcodesvector = null, + descriptionfile = filelocation) > patidlist <- resultfdo[[ ]] > eventdatelist <- resultfdo[[ ]] > drugfrequencydf <- resultfdo[[ ]] > drugfrequencydf <- drugfrequencydf[order(drugfrequencydf$frequency, + decreasing = true), ] > ggfreq <- ggplot(data=drugfrequencydf, aes(x=description, y=frequency)) + + geom_bar(stat="identity") + theme_bw() + + theme(axis.text.x = element_text(angle= , hjust= )) + + xlab("drug product description") .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software > #the structure of the firstdrugobject. > str(resultfdo, strict.width="wrap", list.len = ) am itr ip ty lin e m g ta bl et s am itr ip ty lin e m g ta bl et s am itr ip ty lin e m g ta bl et s at en ol ol m g ta bl et s at en ol ol m g ta bl et s at en ol ol m g ta bl et s ca nd es ar ta n m g ta bl et s ca nd es ar ta n m g ta bl et s li sin op ril m g ta bl et s li sin op ril . m g ta bl et s li sin op ril m g ta bl et s pr op ra no lo l m g ta bl et s pr op ra no lo l m g ta bl et s pr op ra no lo l m g m od ifie d− re le as e ca ps ul es pr op ra no lo l m g ta bl et s to pi ra m at e m g ta bl et s ve nl af ax in e . m g ta bl et s ve nl af ax in e m g m od ifie d− re le as e ca ps ul es ve nl af ax in e m g m od ifie d− re le as e ta bl et s drug product description f re q u e n cy figure : the frequency of first line treatment prescription. getagegroupbyevents in the next example we explore stratifying first-line prescription events by patient character- istics, such as, age, gender, imd, and number of medcodes (for instance, by comorbidities) or prodcodes (for instance, to separate those patients by additional prescriptions), or by any additional clinical event retrieved using cprdlookups.r ?. rdrugtrajectory provides several utility functions to stratify patients (see reference manual for further information). the func- tion getagegroupbyevents calculates the number of first-line prescription events by patient age. by specifying a set of patids and eventdates from the firstdrugobject, we can calculate the number of first-line prescriptions by age-group for patients linked with a specified medical condition: > library(rdrugtrajectory) > filelocation <- "product.txt" .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records > resultfdo <- getfirstdrugprescription(df = testtherapydf, + idlist = null, + prodcodesvector = null, + descriptionfile = filelocation) > patidlist <- resultfdo[[ ]] > eventdatelist <- resultfdo[[ ]] > names(agegenderdf) <- c("patid","age","gender") > #the age-groups: [ , ), [ , ), [ , ), ..., [ , +). > agegroupvector <- c( , , , , , , , , ) > #cprd database release year. > ageatyear <- " " > agegrouplist <- getagegroupbyevents(idlist = as.list(patidlist[ : ]), + eventdatelist = eventdatelist[ : ], + agedf = agegenderdf, + agegroupvector = agegroupvector, + ageatyear = ageatyear) > agegrouplist [[ ]] - - - - - - - - + [[ ]] - - - - - - - - + in the above example, the age of each patient (agedf) was provided using year-of-birth calcu- lated against the release year of the cprd gold database (explained above). by providing the database release year (in ageatyear) and the first prescription eventdate (in eventdatelist), the age of each patient is adjusted against the prescription eventdate year. finally, by using a list slice on idlist and eventdatelist, (individual prescriptions can be specified using their prodcode, for example, eventdatelist$‘ ‘), first prescription prescriptions frequencies by age-group are retrievable (figure ). > library(ggplot ) > agegroupdrugdf <- data.frame(age=names(agegrouplist[[ ]]), + count=unlist(agegrouplist[[ ]]), + drug="amitriptyline mg") > ggamitriptyline <- ggplot(agegroupdrugdf, aes(x=age, y=count)) + + geom_bar(stat="identity") + + theme_bw() + ggtitle("amitriptyline mg") + + theme(axis.text.x = element_text(angle= , hjust= )) + + xlab("age-group") + ylab("frequency") .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software − − − − − − − − + age−group f re q u e n cy amitriptyline mg figure : the distribution of amitriptyline mg as a first-line treatment by age-group. . . prescription sequences mapdrugtrajectory identifying patient prescription trajectories in longitudinal ehrs remains our biggest motiva- tor behind the development of rdrugtrajectory. therefore, we developed mapdrugtrajectory to identify the chronological of patient prescription events. we restrict the calculation to only look for prescription prodcodes as supplied to groupinglist as a named list (named prodcode vectors). the required number of grouped-prescription events is defined by specifying the mindepth and the number of those changes to display is controlled by maxdepth maximum number. by keeping mindepth and maxdepth the same, only patients with a valid number of prescription changes are displayed (figure (a) and (c)). patient records with fewer than mindepth number of changes to prescription sequences are ignored (figure (b)). for further information please refer to the reference manual. in the code below, mapdrugtrajectory returns patients with at least first five grouped pre- scriptions. prodcodes that have not been grouped are ignored. duplication of prodcodes (those from the same group) do not count as a change in treatment: .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records figure : the distribution of grouped prodcodes across three patients. (a) five groups of valid prescription prodcodes, (b) only three groups, (c) five valid groups, in addition to prodcodes and which are ignored. > library(ggplot ) > library(ggalluvial) > structurelist <- list(amitriptyline = c( , , ), + propranolol = c( , , ), + topiramate = c( ), + venlafaxine = c( , , ), + lisinopril = c( , , ), + atenolol = c( , , ), + candesartan = c( ) + ) > resultlist <- mapdrugtrajectory(df = testtherapydf, + mindepth = , + maxdepth = , + groupinglist = structurelist, + removeundefinedcode = true) > df <- resultlist[[ ]] > ggswitch <- ggplot(df, + aes(y = freq, axis = firstdrug, axis = switch , + axis = switch , axis = switch , axis = switch )) + + geom_alluvium(aes(fill = firstdrug), width = / ) + + geom_stratum(width = / , fill = "black", color = "grey") + + geom_label(stat = "stratum", infer.label = true) + + scale_fill_brewer(type = "qual", palette = "set ") + + theme_bw() + theme(legend.position = "none") + + scale_x_discrete(limits = c("first drug", " st switch", " nd switch", + " rd switch"," th switch"), + expand = c(. , . )) + + ggtitle("migraine preventative switching among patients") .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software venlafaxine propranolol lisinopril atenolol amitriptyline candesartan venlafaxine propranolol lisinopril atenolol amitriptyline topiramatecandesartan venlafaxine propranolol lisinopril atenolol amitriptyline topiramate candesartan venlafaxine propranolol lisinopril atenolol amitriptyline topiramatecandesartan venlafaxine propranolol lisinopril atenolol amitriptyline first drug st switch nd switch rd switch th switch f re q migraine preventative switching among patients figure : prescription pattern switching of seven different migraine preventatives. a patient required a a minimum of five changes in prescriptions (including the initial prescription) and, equally, the display was set to five changes in prescription. . . prescription timeline construction rdrugtrajectory contains several functions that transforms patient data into a format com- patible with mean cumulative function (mcf) semi-parametric estimates, prescription per- sistence, prescription incidence, and survival analysis. generatemcfonegroup prescription events are binned into weekly units to increase the statistical power at each time point. the user presents a group at a time, for example, all clinical events of male patients with a first-line prescription of amitriptyline for a migraine. the clinical data has already been refined using the steps for first-line prescription, as described above. the function generatemcfonegroup accepts a dataframe or events, the mcf start date (eventdates are adjusted so all patient records in the dataset begin at the same time), and the minimum number of events per patients (by default this is two events). the following example presents the calculation of first prescription events, the assignment of gender and the calculation of .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records mcf of prescription (therapy dataframe) burden of amitriptyline and propranolol: > library(rdrugtrajectory) > filelocation <- "product.txt" > resultlist <- getfirstdrugprescription(df = testtherapydf, + idlist = null, + prodcodesvector = null, + descriptionfile = filelocation) > patidlist <- resultlist[[ ]] > eventdatelist <- resultlist[[ ]] > drugfrequencydf <- resultlist[[ ]] > drugfrequencydf <- drugfrequencydf[order(drugfrequencydf$frequency, + decreasing = true), ] > amitriptylinepatid <- patidlist$` ` > propranololpatid <- patidlist$` ` > malecode <- > malepatidsdf <- getgenderofpatients(idlist = getuniquepatidlist(testtherapydf), + genderdf = agegenderdf, + gendercodevector = malecode) > amitriptylinemalepatids <- subset(amitriptylinepatid, + amitriptylinepatid %in% malepatidsdf$patid) > propranololmalepatids <- subset(propranololpatid, + propranololpatid %in% malepatidsdf$patid) > amimaletherapydf <- subset(testtherapydf, + testtherapydf$patid %in% amitriptylinemalepatids) > propmaletherapydf <- subset(testtherapydf, + testtherapydf$patid %in% propranololmalepatids) > amimalemcfdf <- generatemcfonegroup(therapydf = amimaletherapydf, + startdatecharvector = " - - ", + minrecords = ) > propmalemcfdf <- generatemcfonegroup(therapydf = propmaletherapydf, + startdatecharvector = " - - ", + minrecords = ) > amimalemcfdf <- cbind(amimalemcfdf, drug = "amitriptyline") > propmalemcfdf <- cbind(propmalemcfdf, drug = "propranolol") > drugmcfdf <- rbind(amimalemcfdf, propmalemcfdf) > resultmcf <- reda::mcf(reda::recur(week, id, no.) ~ drug, data = drugmcfdf) > mcfplot <- reda::plot(resultmcf, conf.int=true) + + ggplot ::xlab("weeks") + ggplot ::theme_bw() + ggplot ::ggtitle("") getfirstdrugincidencerate prescription incidence be calculated with getfirstdrugincidencerate. the following code demonstrates how to use a firstdrugobject to calculate incidence rates for a set of prodcodes. the study observation starts from the enrollmentdate and ends at the studyenddate: > library(rdrugtrajectory) .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software weeks m c f e st im a te s drug amitriptyline propranolol figure : mcf of drug prescriptions of patients with a first drug prescription for either amitriptyline or propranolol, stratified by gender. the dotted lines indicate a % confidence interval. > filelocation <- "product.txt" > druglist <- unique(testtherapydf$prodcode) > requiredprods <- druglist[ : ] > firstdrugobject <- getfirstdrugprescription(df = testtherapydf, + idlist = null, + prodcodesvector = requiredprods, + descriptionfile = filelocation) > medhistorydf <- constructmedicalhistory(testclinicaldf, null, testtherapydf) > patidlist <- unlist(firstdrugobject$patidlist) > resultmatrix <- getfirstdrugincidencerate(firstdrugobject = firstdrugobject, + medhistorydf = medhistorydf, + enrollmentdate = as.date(" - - "), + studyenddate = as.date(" - - ")) > incidencedf <- as.data.frame(t(resultmatrix), stringsasfactors = true) the above example returns an incidence rate of . per person years over a cohort of .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records patients. for a detailed description please see detail for getfirstdrugincidencerate in the reference manual. getdrugpersistence prescription persistence is calculated as the fraction of patients with a prescription for a specific treatment n-days after the first prescription event. for example, if we wanted to calculate the fraction of patients with a prescription -days after their first prescription, with a -day buffer either side, one specifies a duration of -days and a preceding buffer of -days (therefore, capturing the range to , -days either side of one calender year): > library(rdrugtrajectory) > patientlist <- getdrugpersistence(therapydf = testtherapydf, + idlist = null, + prodcodelist = null, + duration = , + buffer = , + endofrecorddate = " - - ") of patient therapy records, patients had a prescription (+/- ) days after the first prescription event on record, resulting in a crude fraction of only . patients. getdrugpersistence only observes events recorded precisely duration days after the first prescription. the buffer can be used to identify patients who received a prescription shortly after the end of the duration, but more importantly, to ensure patients actively undergoing treatment (indicated by a prescription shortly before the desired duration days) are included. as the buffer is reduced, the fraction of prescription persistence is reduced until the algorithm attempts to only identify patients with a prescription exactly duration of days after the first prescription. future software updates will incorporate repeat prescription data to increase the accuracy of the calculation. . closing remarks and future work rdrugtrajectory is an r package which has the potential for exciting applications such as im- proving clinical decision-making, identifying possible new treatments and analysing outcomes from existing treatments. we have demonstrated several functions, some of which detail sorting and matching records whilst others demonstrate fundamental statistical analysis. we used fabricated clinical and prescription dataframes, along with the age, gender and index of multiple deprivation score of each patient and presented analyses of cohort-wide prescrip- tion patterns, first-line treatment distributions, how to stratify by patient characteristics, and some basic tools to assist longitudinal analysis of prescriptions. the descriptions presented in this publication are not substitutes for the material in the reference manual. we recommend the reader consults the r ? help command or reference manual before running a function. in particular, functions related to the construction of timelines for survival analysis (time dependent/independent cox regression, kaplan meier survival curves and mean cumulative function) or a matrix for drug incidence rate requires fine tuning of several parameters. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / journal of statistical software . . . . buffer size (n days before ) f ra ct io n o f p re sc ri p tio n p e rs is te n ce figure : the fraction of prescription persistence adjusted by a buffer number of days before a calender year. as the buffer approaches the value of duration the fraction approaches . the latest release of rdrugtrajectory along with source code and reference manual is available for download from https://github.com/acnash/rdrugtrajectory. whilst active members of the scientific research community we will continue to add new features to rdrugtrajectory whilst making necessary improvements to existing features. acknowledgements oxford science innovation, nihr oxford biomedical research centre and nihr oxford health biomedical research centre (informatics and digital health theme, grant brc- - ). thanks to dr michelle hardy for assistance with the article. references bally m, dendukuri n, rich b, nadeau l, helin-salmivaara a, garbe e, brophy jm ( ). “risk of acute myocardial infarction with nsaids in real world use: bayesian meta- .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://github.com/acnash/rdrugtrajectory https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / rdrugtrajectory: analysing drug prescriptions in electronic health care records analysis of individual patient data.” british medical journal, , j . doi: . / bmj.j . ghosh re, crellin e, beatty s, donegan k, myles p, williams r ( ). “how clinical practice research datalink data are used to support pharmacovigilance.” therapeutic advances in drug safety, , – . doi: . / . hepp z, dodick dw, varon sf, chia j, matthew n, gillard p, hansen rn, devine eb ( ). “persistence and switching patterns of oral migraine prophylactic medications among patients with chronic migraine: a retrospective claims analysis.” cephalalgia, ( ), – . doi: . / . oyinlola jo, campbell j, kousoulis aa ( ). “is real world evidence influencing practice? a systematic review of cprd research in nice guidance.” bmc health service research, ( ), – . doi: . /s - - - . affiliation: nuffield department of clinical neurosciences medical sciences division university of oxford oxford uk ox du e-mail: anthony.nash@ndcn.ox.ac.uk journal of statistical software http://www.jstatsoft.org/ published by the foundation for open access statistics http://www.foastat.org/ mmmmmm yyyy, volume vv, issue ii submitted: yyyy-mm-dd doi: . /jss.v .i accepted: yyyy-mm-dd .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://dx.doi.org/ . /bmj.j http://dx.doi.org/ . /bmj.j http://dx.doi.org/ . / http://dx.doi.org/ . / http://dx.doi.org/ . /s - - - mailto:anthony.nash@ndcn.ox.ac.uk http://www.jstatsoft.org/ http://www.foastat.org/ http://dx.doi.org/ . /jss.v .i https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ancestralclust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees ancestralclust: clustering of divergent nucleotide sequences by ancestral sequence reconstruction using phylogenetic trees lenore pipes ,∗ and rasmus nielsen , , ∗ department of integrative biology, university of california-berkeley, berkeley, , usa, department of statistics, university of california-berkeley, berkeley, ca , usa, and globe institute, university of copenhagen, københavn k, denmark ∗to whom correspondence should be addressed. abstract motivation: clustering is a fundamental task in the analysis of nucleotide sequences. despite the expo- nential increase in the size of sequence databases of homologous genes, few methods exist to cluster divergent sequences. traditional clustering methods have mostly focused on optimizing high speed clus- tering of highly similar sequences. we develop a phylogenetic clustering method which infers ancestral sequences for a set of initial clusters and then uses a greedy algorithm to cluster sequences. results: we describe a clustering program ancestralclust, which is developed for clustering divergent sequences. we compare this method with other state-of-the-art clustering methods using datasets of homologous sequences from different species. we show that, in divergent datasets, ancestralclust has higher accuracy and more even cluster sizes than current popular methods. availability and implementation: ancestralclust is an open source program available at https://github.com/lpipes/ancestralclust contact: lpipes@berkeley.edu supplementary information: supplementary figures and table are available online. introduction traditional clustering methods such as uclust (edgar, ), cd-hit (fu et al., ), and dnaclust (ghodsi et al., ) use hierarchical or greedy algorithms that rely on user input of a sequence identity threshold. these methods were developed for high speed clustering of a high quantity of highly similar se- quences (ghodsi et al., ; li et al., ; edgar, ) and, generally, these methods are considered unreliable for identity thresholds < % because of either the poor quality of alignments at low identities (zou et al., ) or because the performance of the threshold used to count short words drops dramatically with low identities (huang et al., ). at low identities, these meth- ods produce uneven clusters where the majority of sequences are contained in only a few clusters (chen et al., ) and the high variance in cluster sizes reduces the utility of the clustering step for many practical purposes. clustering of divergent sequences is a fundamental step in genomics analysis because it allows for an early divide-and-conquer strategy that will significantly increase the speed of downstream analyses (zheng et al., ) and clus- tering of divergent sequences is a frequent request of users of at least one clustering method (huang et al., ). currently, there are no clustering methods that can accurately cluster large taxo- nomically divergent metabarcoding reference databases such as the barcode of life database (ratnasingham and hebert, ) in relatively even clusters. only a few other methods, such as sp- clust (matar et al., ) and treecluster (balaban et al., ), exist for clustering potentially divergent sequences. spclust cre- ates clusters based on the use of laplacian eigenmaps and the gaussian mixture model based on a similarity matrix calculated on all input sequences. while this approach is highly accurate, the calculation of an all-to-all similarity matrix is a computation- ally exhaustive step. treecluster uses user-specified constraints for splitting a phylogenetic tree into clusters. however, treeclus- ter requires an input tree and thus can also be prohibitively slow .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pipes and nielsen for large numbers of sequences where a phylogenetic tree is dif- ficult to estimate reliably. with the increasing size of reference databases (schoch et al., ), there is a need for new compu- tationally efficient methods that can cluster divergent sequences. here we present ancestralclust that was specifically developed for clustering of divergent metabarcoding reference sequences in clusters of relatively even size. methods to cluster divergent sequences, we developed ancestralclust which is written in c (figure ). firstly, k random sequences are chosen and the sequences are aligned pairwise using the wavefront algorithm (marco-sola et al., ). a jukes-cantor distance ma- trix is constructed from the alignments and a neighbor-joining phylogenetic tree is constructed. the jukes-cantor model is cho- sen for computational speed, but more complex models could in principle be used to potentially increase accuracy but also in- crease computational time. the c − longest branches in the tree are then cut to yield c clusters. these subtrees comprise the initial starting clusters. the sequences in each starting clus- ter are aligned in a multiple sequence alignment using kalign (lassmann, ). the ancestral sequences at the root of the tree of each cluster is estimated using the maximum of the posterior probability of each nucleotide using standard programming algo- rithms from phylogenetics (see e.g., yang, ). the ancestral sequences are used as the representative sequence for each cluster. next, the rest of the sequences are assigned to each cluster based on the shortest nucleotide distance from the wavefront alignment between the sequence and the c ancestral sequences. if the short- est distance to any of the c ancestral sequences is larger than the average distance between clusters, the sequence is saved for the next iteration. we iterate this process until all sequences are as- signed to a cluster. in each iteration after the first iteration, a cut of a branch in the phylogenetic tree is chosen if the the branch is longer that the average length of branches cut in the first iteration. in praxis, only one or two iterations are needed for most data sets if k is defined to be sufficiently large. we compared ancestralclust to five other state-of-the-art clustering methods: uclust (edgar, ), meshclust (james and girgis, ), dnaclust (ghodsi et al., ), cd-hit (fu et al., ), and spclust (matar et al., ). we used a variety of measurements to assess the accuracy and evennness of the clustering. we calculated two traditional measures of accu- racy, purity and normalized mutual information (nmi), used in bonder et al. ( ). the purity of clusters is calculated as: purity(Ω, c) = n ∑ k max j |ωk ∩ cj| ( ) where Ω = w , w , ..., wk is the set of clusters, c = c , c , ..., cj is the set of taxonomic classes and n is the total number of sequences. nmi is calculated as: nmi(Ω, c) = i(Ω, c) [h(Ω) + h(c)]/ ( ) where mutual information gain is i(Ω, c) and h is the entropy function. to measure the evenness of the clusters, we used the coefficient of variation which is calculated as: cv = √∑j i (ni − m) /j m ( ) where ni is the number of sequences in cluster i, j is the total number of clusters, and m is the mean size of the clusters. we also used a taxonomic incompatibility measure to assess the ac- curacy of the clusters. let a,b be a pair of species found in cluster i. incompatibility at a given taxonomic rank is calculated by first identifying the number of times a and b exist in clusters other than cluster i. the total incompatibility is calculated by summing over all pairs of sequences (a,b) and all i. both nmi and taxonomic incompatibility are very sensitive to the number of clusters and also to unevenness of cluster sizes. to allow fair comparison when numbers of clusters and evenness of cluster sizes vary we, therefore, calculate the relative nmi and relative incompatibility. these measures are calculated by scaling them relative to their expected values under random as- signments given the number of clusters and the cluster sizes. we estimated relative nmi by dividing the raw nmi score by the average nmi of clusterings in which sequences have been as- signed at random with equal probability to clusters, such that the cluster sizes are same as the cluster sizes produced in the original clustering. the same procedure was used to convert the taxonomic incompatibility measure into relative incompatibility. results to first assess performance of clustering methods on divergent nucleotide sequences, we used random samples of , sequences from three metabarcode reference databases ( , s, and cytochrome oxidase i (coi)) from the caledna project meyer et al. ( ). we chose to compare our method on this dataset against uclust because it is the most widely used clus- tering program and it performs better than cd-hit on low identity thresholds (chen et al., ). we first compared ancestralclust against uclust using relative nmi and coefficient of variation (figure ). we used k = random initial sequences, which is % of the total num- ber of sequences in each sample and c = cuts in the initial phylogenetic tree. notice that the relative nmi tends to be higher with a lower coefficient of variation for ancestralclust across all barcodes. this suggests, that for these divergent edna sequences, ancestralclust provides clusterings that are more even in size and that are more consistent with conventional taxonomic assignment. as a second measure of accuracy we measured relative incom- patibility and coefficient of variation using ancestralclust and uclust using for the same datasets under the same running .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ancestralclust conditions. notice in figure , ancestralclust tends to create balanced clusters with lower relative taxonomic incompatibilities compared to uclust at all taxonomic levels. similar results are seen for metabarcode s (fig s ). however, for metabar- code s (fig s ), ancestralclust performs noticeably better than uclust at the species, genus, and family levels but at the order, class, and phylum levels it performs either the same or worse. also, at the species, genus, and family levels, it is apparent that as the uclust clusters approach a lower coefficient of variation, the relative incompatibility increases dramatically. next, we analyzed two datasets with different properties: one dataset of diverse species from the same gene and another dataset of homologous genes from species of the same phyla. in the first dataset, we expect that the sequences to cluster according to species. in the second dataset, we expect the sequences to cluster according to different genes. we compared ancestralclust to four commonly used clustering programs (uclust, meshclust , cd- hit , and dnaclust) and one clustering program designed for divergent sequences, spclust. the first dataset contained , sequences from the coi caledna database from divergent species that were from different phyla and different classes and the second data set contained sequences from different genes from taxonomically similar species. first, we compared all meth- ods using , coi sequences from the different species (table ). we expect these sequences to form different clus- ters, each including all the sequences from one species. we chose identity thresholds to enforce the expected number of clusters for each method. we were unable to form clusters using cd-hit because the program does not allow clustering of sequences with identity thresholds < % at default parameters. for spclust, we used the three precision modes available for the method. in this analysis, ancestralclust achieved a perfect clustering (the purity was and relative incompatibility was ) although it was the second slowest, and had the second lowest memory require- ments. uclust was one of the fastest methods and used the least amount of memory but had the second lowest purity with third highest relative nmi values. meshclust had no incompatibilities and the second highest purity and relative nmi values but was the third slowest method. dnaclust had the most uneven clusters and the second lowest relative nmi value with the highest relative incompatibility. spclust only identified one cluster, with a com- putational time of ~ days. in comparison, ancestralclust took ~ minutes and uclust used < second. next, we analyzed ’genomic set ’ from matar et al. ( ), which consists of sequences from homologous genes (fcer g, s a , s a , s a , s a , and sh bgrl in table ). we expect these sequences to form clusters. we varied the identity thresholds for uclust and meshclust using thresholds . , . , and . . for cd-hit, we used the lowest identity threshold available on default parameters which is . . we were unable to use dnaclust for this anal- ysis because it cannot handle sequences longer than bp (the average sequence length was , . bp and the longest sequence was , bp). since this dataset contained different genes, we calculated relative nmi using genes as the classes and did not use incompatibility as an accuracy measure. only ancestralclust, uclust, and meshclust produced the expected number of clus- ters, and among the methods that created the expected number of clusters, ancestralclust had the highest purity value. ancestral- clust was the second slowest method and had the highest memory requirements which is due to the wavefront algorithm alignment which iso(s ) in memory requirements where s is the alignment score. since alignments were performed using different genes that were longer than . kb, this resulted in a high value of s. sp- clust had the highest relative nmi using all precision modes and the same purity as ancestralclust for its moderate and maximum precision modes, however, failed to produce the expected number of clusters. conclusions we developed a phylogenetic-based clustering method, ances- tralclust, specifically to cluster divergent metabarcode sequences. we performed a comparative study between ancestralclust and widely used clustering programs such as uclust, cd-hit, dnaclust, meshclust , and for divergent sequences, spclust. uclust and dnaclust are substantially faster than ances- tralclust and should be the preferred method if computational speed is the main concern. however, ancestralclust tends to form clusters of more even size with lower taxonomic incompatibility and higher nmi than other methods, for the relatively divergent sequences analyzed here. we recommend the use of ancestral- clust when sequences are divergent, especially if a relatively even clustering is also desirable, for example for various divide-and- conquer approaches where computational speed of downstream analyses increases faster than linearly with cluster size. acknowledgements this work used the extreme science and engineering discov- ery environment (xsede) bridges system at the pittsburgh supercomputing center through allocation bio . references balaban, m., moshiri, n., mai, u., jia, x., and mirarab, s. ( ). treecluster: clustering biological sequences using phylogenetic trees. plos one, ( ), e . bonder, m. j., abeln, s., zaura, e., and brandt, b. w. ( ). compar- ing clustering and pre-processing in taxonomy analysis. bioinformatics, ( ), – . chen, q., wan, y., zhang, x., lei, y., zobel, j., and verspoor, k. ( ). comparative analysis of sequence clustering methods for deduplication of biological databases. j. data and information quality, ( ). edgar, r. c. ( ). search and clustering orders of magnitude faster than blast. bioinformatics, ( ), – . fu, l., niu, b., zhu, z., wu, s., and li, w. ( ). cd-hit: accelerated for clustering the next-generation sequencing data. bioinformatics, ( ), – . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pipes and nielsen ghodsi, m., liu, b., and pop, m. ( ). dnaclust: accurate and efficient clustering of phylogenetic marker genes. bmc bioinformatics, ( ), – . huang, y., niu, b., gao, y., fu, l., and li, w. ( ). cd-hit suite: a web server for clustering and comparing biological sequences. bioinformatics, ( ), – . james, b. t. and girgis, h. z. ( ). meshclust : application of alignment-free identity scores in clustering long dna sequences. biorxiv, page . lassmann, t. ( ). kalign : multiple sequence alignment of large datasets. li, w., jaroszewski, l., and godzik, a. ( ). clustering of highly homologous sequences to reduce the size of large protein databases. bioinformatics, ( ), – . marco-sola, s., moure lópez, j. c., moreto planas, m., and es- pinosa morales, a. ( ). fast gap-affine pairwise alignment using the wavefront algorithm. bioinformatics, (btaa ), – . matar, j., khoury, h. e., charr, j.-c., guyeux, c., and chrétien, s. ( ). spclust: towards a fast and reliable clustering for potentially divergent biological sequences. computers in biology and medicine, , . meyer, r. s., curd, e. e., schweizer, t., gold, z., ramos, d. r., shirazi, s., kandlikar, g., kwan, w.-y., lin, m., freise, a., et al. ( ). the california environmental dna “caledna” program. biorxiv, page . ratnasingham, s. and hebert, p. d. ( ). bold: the barcode of life data system (http://www. barcodinglife. org). molecular ecology notes, ( ), – . schoch, c. l., ciufo, s., domrachev, m., hotton, c. l., kannan, s., khovanskaya, r., leipe, d., mcveigh, r., o’neill, k., robbertse, b., et al. ( ). ncbi taxonomy: a comprehensive update on curation, resources and tools. database, . yang, z. ( ). molecular evolution: a statistical approach. oxford university press. zheng, w., mao, q., genco, r. j., wactawski-wende, j., buck, m., cai, y., and sun, y. ( ). a parallel computational framework for ultra-large- scale sequence clustering analysis. bioinformatics, ( ), – . zou, q., lin, g., jiang, x., liu, x., and zeng, x. ( ). sequence clus- tering in bioinformatics: an empirical study. briefings in bioinformatics, ( ), – . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ancestralclust figure . overview of ancestralclust. in ( ), k random sequences are chosen for the initial clusters. ( ) using the k sequences a distance matrix is constructed. using the distance matrix, a neighbor-joining tree is constructed and c − cuts are made to create c clusters. in ( ), each cluster is multiple sequenced aligned and the ancestral sequences are reconstructed in the root node of each tree. the rest of the unassigned sequences are then aligned to the ancestral sequences of each cluster and the shortest distance to each ancestral sequence is calculated. the process is iterated until all sequences are assigned to a cluster. figure . relative nmi against coefficient of variation for ancestralclust and uclust for samples of , randomly chosen s, s, and coi reference sequences from the caledna project (meyer et al., ). the similarity threshold for uclust was . . for ancestralclust, we used initial random sequences with initial clusters. relative nmi was calculated by dividing nmi by the average of random samples of the same fixed cluster size. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pipes and nielsen figure . relative incompatibility against coefficient of variation for ancestralclust and uclust for samples of , randomly chosen coi reference sequences. coi reference sequences are from the caledna project (meyer et al., ). the similarity threshold for uclust was . . for ancestralclust, we used initial random sequences with initial clusters. table . comparisons of clustering methods using , coi sequences from different species. the list of species can be found in table s . incompatibility was calculated at the taxonomic rank of species. for uclust, meshclust , and dnaclust, the identity thresholds were chosen to force the expected number of clusters. for cd-hit, the lowest possible identity was chosen which is . . in the case of spclust, coefficient of variation cannot be calculated for cluster. spclust clusters were created with version . method # of clusters time (sec) mem (mb) purity relative incompat. (species) relative nmi coeff. of var. ancestralclust . . . . uclust < . . . . . meshclust . . . . . cd-hit . . . . . dnaclust < . . . . . spclust (fast) . . - spclust (moderate) . . - spclust (maxprecision) . . - .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ancestralclust table . comparisons of clustering methods using sequences from homologous genes from matar et al. ( ).’id’ refers to the identity threshold used. we used identity thresholds of . , . , and . for uclust and meshclust . we used precision levels of fast, moderate, and maximum for spclust using version since version only produced cluster for all modes. dnaclust has a maximum sequence length of bp and could not be used on this dataset. method # of clusters time (sec) memory (mb) purity relative nmi coefficient of variation ancestralclust . . . . . uclust (id= . ) . . . . uclust (id= . ) . . . . uclust (id= . ) . . . . . meshclust (id= . ) . . . . . meshclust (id= . ) . . . . . meshclust (id= . ) . . . . . spclust (fast) . . . . . spclust (moderate) . . . . . spclust (max precision) . . . . . cd-hit (id= . ) . . . . . dnaclust - - - - - - .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / competitive binding of stats to receptor phospho-tyr motifs accounts for altered cytokine responses in autoimmune disorders competitive binding of stats to receptor phospho-tyr motifs accounts for altered cytokine responses in autoimmune disorders stephan wilmes *, polly-anne jeffrey *, jonathan martinez-fabregas , maximillian hafer , paul fyfe , elizabeth pohler , silvia gaggero , martín lópez-garcía , grant lythe , thomas guerrier , david launay , mitra suman , jacob piehler , carmen molina-parís # and ignacio moraga # division of cell signalling and immunology, school of life sciences, university of dundee, dundee, uk. department of applied mathematics, school of mathematics, university of leeds, leeds, uk. department of biology and centre of cellular nanoanalytics, university of osnabrück, osnabrück, germany. université de lille, inserm umr cnrs umr –canther and institut pour la recherche sur le cancer de lille (ircl), lille, france. univ. lille, inserm, chu lille, u - infinite - institute for translational research in inflammation, f- lille, france. * these authors contributed equally to this work # these authors share senior authorship abstract cytokines elicit pleiotropic and non-redundant activities despite strong overlap in their usage of receptors, jaks and stats molecules. we use il- and il- to ask how two cytokines activating the same signaling pathway have different biological roles. we found that il- induces more sustained stat phosphorylation than il- , with the two cytokines inducing comparable levels of stat phosphorylation. mathematical and statistical modelling of il- and il- signaling identified stat binding to gp , and stat binding to il- ra, as the main dynamical processes contributing to sustained pstat by il- . mutation of tyr on il- ra decreased il- -induced stat phosphorylation by % but had limited effect on stat phosphorylation. strong receptor/stat coupling by il- initiated a unique gene expression program, which required sustained stat phosphorylation and irf expression and was enriched in classical interferon stimulated genes. interestingly, the stat/receptor coupling exhibited by il- /il- was altered in patients with systemic lupus erythematosus (sle). il- /il- induced a more potent stat activation in sle patients than in healthy controls, which correlated with higher stat expression in these patients. partial inhibition of jak activation by sub-saturating doses of tofacitinib specifically lowered the levels of stat activation by il- . our data show that receptor and stats concentrations critically contribute to shape cytokine responses and generate functional pleiotropy in health and disease. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction il- and il- both have intricate functions regulating inflammatory responses ( ). il- is a hetero-dimeric cytokine comprised of p and ebi subunits ( ). il- exerts its activities by binding gp and il- rα receptor subunits in the surface of responsive cells, triggering the activation of the jak /stat /stat signaling pathway. il- elicits both pro- and anti- inflammatory responses, although the later activity seems to be the dominant one ( ). il- stimulation inhibits rorgt expression, thereby suppressing th- commitment and limiting subsequent production of pro-inflammatory il- ( , ). moreover, il- induces a strong production of anti-inflammatory il- on (tbet+ and foxp -) tr- cells ( - ) further contributing to limit the inflammatory response. il- engages a hexameric receptor complex comprised of each of two copies of il- ra, gp and il- ( ), triggering the activation, as il- does, of the jak /stat /stat signaling pathway. however, opposite to il- , il- is known as a paradigm pro-inflammatory cytokine ( , ). il- inhibits lineage differentiation to treg cells ( ) while promoting th- ( , ), thus supporting its pro-inflammatory role. how il- and il- elicit opposite immuno-modulatory activities despite activating almost identical signaling pathways is currently not completely understood. the relative and absolute stats activation levels seem to have intricate roles, which lead to a strong signaling and functional plasticity by cytokines. although il- robustly activates stat , it is capable to mount a considerable stat response as well ( ). moreover, in the absence of stat , il- induces a strong stat response comparable to ifng – a prototypic stat activating cytokine ( ). likewise, the absence of stat potentiates the stat response for il- , which normally elicits a strong stat response, rendering it to mount an il- -like response ( ). furthermore, negative feedback mechanisms like socss and phosphatases have been described as critical players influencing stat and stat phosphorylation kinetics and thereby shaping their signal integration for gp -utilizing cytokines ( - ). yet, how all these molecular components are integrated by a given cell to produce the desired response is still an open question. among the il- /il- cytokine family, il- exhibits a unique stat activation pattern. the majority of gp -engaging cytokines activate preferentially stat , with activation of stat being an accessory or balancing component ( , ). il- , however, triggers stat and stat activation with high potency ( ). indeed, different studies have shown that il- responses rely on either stat ( - ) or stat activation ( , ). moreover, recent transcriptomics studies showed that in the absence of stat , il- and il- lost more than % of target gene induction. yet, stat was the main factor driving the specificity of the il- versus the il- response, highlighting a critical interplay of stat and stat engagement ( ). while the biological responses induced by il- and il- have been extensively studied ( , ), the very initial steps of signal activation and kinetic integration by these two cytokines have not been comprehensively analysed. since the different biological outcomes elicited by il- and il- are most likely encoded in the early events of cytokine stimulation, here we specifically aimed to identify the molecular determinants underlying functional selectivity by il- in human t-cells. we asked how a defined cytokine stimulus is propagated in time over multiple layers of signaling to produce the desired response. to this end, we probed il- and il- signaling at different scales, ranging from cell surface receptor assembly and early stat / effector activation to an unbiased and quantitative multi-omics approach: phospho- proteomics after early cytokine stimulation, kinetics of transcriptomic changes and alteration of the t-cell proteome upon prolonged cytokine exposure. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / il- and il- induced similar levels of assembly of their respective receptor complexes, which resulted in comparable phosphorylation of stat by the two cytokines. il- , on the other hand, triggered a more sustained stat phosphorylation. to decipher the molecular events which determine sustained stat phosphorylation by il- , we mathematically model the stat and stat signaling kinetics induced by each of these cytokines. we identified differential binding of stat and stat to il- ra and gp , respectively, as the main factor contributing to a sustained stat activation by il- . at the transcriptional level, il- triggered the expression of a unique gene program, which strictly required the cooperative action between sustained pstat and irf expression to drive the induction of an interferon- like gene signature that profoundly shaped the t-cell proteome. interestingly, our mathematical models of il- and il- signaling predicted that changes in receptor and stat expression could fundamentally change the magnitude and timescale of the il- and il- responses. we found high levels of stat expression in sle patients when compared to healthy donors, which correlated with biased stat responses induced by il- and il- in these patients. strikingly, we could specifically inhibit stat activation by il- using suboptimal doses of the jak inhibitor tofacitinib. this could provide a new strategy to specifically target individual stats engaged by cytokines. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / results: il- induces a more sustained stat activation than hypil- in human th- cells il- and il- are critical immuno-modulatory cytokines. while il- engages a hexameric surface receptor comprised of two molecules of il- ra and two molecules of gp to trigger the activation of stat and stat transcription factors (figure a), il- binds gp and il- ra to trigger activation of the same stats molecules (figure a). despite sharing a common receptor subunit, gp , and activating similar signaling pathways, these two cytokines exhibit non-redundant immuno-modulatory activities, with il- eliciting a potent pro- inflammatory response and il- acting more as an anti-inflammatory cytokine. here, we set to investigate the molecular rules that determine the functional specificity elicited by il- and il- using human th- cells as a model experimental system. due to the challenging recombinant expression of the human il- , we have recombinantly produced a murine single-chain variant of il- (p and ebi ) which cross-reacts with the human receptors and triggers potent signaling, comparable to the signaling output produced by commercial human il- ( ) (supp. fig. a). in addition, we have used a linker-connected single-chain fusion protein of il- ra and il- termed hyperil- (hypil- ) ( ) to diminish il- signaling variability due to changes in il- ra expression during t cell activation ( ). cd + t cells from human buffy coat samples were isolated by magnetic activated cell sorting (macs) and grew under th- polarizing conditions. th- cells were then used to study in vitro signaling by il- and il- (supp. fig. b). we took advantage of a barcoding methodology allowing high-throughput multiparameter flow cytometry to perform detailed dose/response and kinetics studies induced by hypil- and il- in th- cells ( ) (supp. fig. b). dose- response experiments with il- and hypil- on th- cells showed concentration-dependent phosphorylation of stat and stat . phosphorylation of stat / was more sensitive to activation by il- with an ec of ~ pm compared to ~ pm for hypil- (figure b). despite this difference in sensitivity, both cytokines yielded the same activation amplitude for pstat . for pstat , however, we observed a significantly reduced maximal amplitude for hypil- relative to il- (figure b). we next performed kinetic studies to assess whether the poor stat activation by hypil- was a result from different activation kinetics. for stat , we saw the peak of phosphorylation after ~ - minutes, followed by a gradual decline. both cytokines exhibited an almost identical sustained pstat profile, with ~ % of activation still seen after h of continuous stimulation. interestingly, il- did not only activate stat with higher amplitude but also more sustained than hypil- (figure c). this could be better appreciated when pstat levels were normalized to maximal mfi for each cytokine, with il- inducing clearly a more sustain phosphorylation of stat than hypil- (supp. fig. c). the same phenotype was observed in other t-cell subsets of activated pbmcs (supp. fig. d). as cell surface gp levels are significantly reduced upon t-cell activation ( ), we next investigated whether the transient stat activation profile induced by hypil- resulted from limited availability of gp . for that we generated a rpe cell clone stably expressing ten times higher levels of gp in its surface (figure d, right panel). stimulation of this rpe clone with hypil- resulted in a more sustained activation of stat , with very little effect on stat activation kinetics when compared to rpe wild type cells, suggesting that gp receptor density does not contribute to the transient stat activation kinetics elicited by hypil- (figure d). ligand-induced cell-surface receptor assembly by il- and hypil- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / we next investigated whether il- and hypil- elicited differential cell surface receptor engagement that could explain their distinct signaling output. for that, we measured the dynamics of receptor assembly in the plasma membrane of live cells by simultaneous dual- colour total internal reflection fluorescence (tirf) imaging. rpe cells were chosen as a model experimental system since they do not express endogenous il- ra (supp. fig. e). we used previously described rpe gp ko cells (supp. fig. a) ( ) to transfect and express tagged variants of il- ra and gp , to allow quantitative site-specific fluorescence cell surface labelling by dye-conjugated nanobodies (nbs) (figure e) as recently described in ( ). for both il- ra and gp we found a random distribution and unhindered lateral diffusion of individual receptor monomers (figure f). single molecule co- localization combined with co-tracking analysis was then used to identify correlated motion of il- ra and gp which was taken as a readout for receptor heterodimer formation ( ) (figure f, figure supp. movie ). in the resting state, we did not observe pre-assembly of il- ra and gp . however, after stimulation with il- we found substantial heterodimerization (figure f & g, supp. fig. b, figure supp. movie & ). at elevated laser intensities, bleaching analysis of individual complexes confirmed a one-to-one ( : ) complex stoichiometry of il- ra and gp , whereas single-molecule förster resonance energy transfer (fret) further corroborated close molecular proximity of the two receptor chains (figure h). we also observed association and dissociation events of receptor heterodimers, pointing to a dynamic equilibrium between monomers and dimers as proposed for other heterodimeric cytokine receptor systems ( , ) (figure supp. movie ). to measure homodimerization of gp by hypil- , we stochastically labelled gp with equal concentrations of the same nb species conjugated to either of the two dyes ( ). we saw strong homodimerization of gp after stimulation with hypil- (figure g, supp. fig. b , figure supp. movie ). homodimerization was confirmed either by single- color dual-step bleaching or dual-color single-step bleaching as shown for other homodimeric cytokine receptors (supp. fig. c) ( ). for both cytokine receptor systems, we saw a cytokine-induced reduction of the diffusion mobility, which has been ascribed to increased friction of receptor dimers diffusing in the plasma membrane. however, we note that hypil- stimulation impaired diffusion of gp more strongly than il- did, possibly indicating faster receptor internalization (supp. fig. d). based on the dimerization data, we were able to calculate the two-dimensional equilibrium dissociation constants (𝐾!"!) according to the law of mass action for a dynamic monomer-dimer equilibrium: for il- -induced heterodimerization of il- ra and gp , we calculated a d kd of ~ . µm- . in activated t-cells with high levels and a significant excess of il- ra over gp , this 𝐾!"! ensures strong receptor assembly by il- ( ). the d kd for gp homodimerization by hypil- was ~ . µm- . this higher affinity is most likely due to the two high-affinity binding sites engaged in the hexameric receptor complex ( ). however, in t-cells the expression of gp can be particularly low, thus, probably limiting hypil- . taken together, these experiments marked ligand-induced receptor assembly as the initial step triggering downstream signaling for both il- and hypil- , with no obvious differences in their receptor activation mechanism which could support the observed more sustained stat activation elicited by il- . mathematical and statistical analysis of hypil- and il- induced stat kinetic responses to gain further insight into the molecular rules and kinetics that define il- sustained stat phosphorylation, we developed two mathematical models of the initial steps of hypil- and .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / il- receptor-mediated signaling, respectively. the mathematical model for each cytokine considers the following events: i) cytokine association and dissociation to a receptor chain (figure a, supp. fig. a and b, top panel), ii) cytokine-induced dimer association and dissociation (supp. fig. a and b, bottom panel), iii) stat (or stat ) binding and unbinding to dimer (supp. fig. c and d), iv) stat (or stat ) phosphorylation when bound to dimer (supp. fig. c and d), v) internalisation/degradation of complexes (supp. fig. e and f), and vi) dephosphorylation of free stat (or stat ) (supp. fig. g). details of model assumptions, model parameters and parameter inference have been provided in the material and methods under mathematical models and bayesian inference. we first wanted to explore if there existed a potential feedback mechanism in the way in which receptor molecules are internalised/degraded over time. to this end, and for each cytokine model, we considered two hypotheses: hypothesis assumes that receptor complexes (supp. fig. e and f) are internalised with rate proportional to the concentration of the species in which they are contained (e.g., different dimer types), and hypothesis , that receptor complexes are internalised with rate proportional to the product of the concentration of the species in which they are contained and the sum of the concentrations of free phosphorylated stat and stat . hypothesis is consistent with a negative feedback mechanism in which pstat molecules translocate to the nucleus, where they increase the production of negative feedback proteins such as socs . as described in the material and methods (mathematical models and bayesian inference) we made use of the rpe experimental data set to carry out mathematical model selection for the two different hypotheses. we found that hypothesis could explain the data better than hypothesis , with a probability of %. this result can be seen in figure b, in which we plot, for different values of the distance threshold between the mathematical model output and the data (see mathematical models and bayesian inference in material and methods, for details), the relative probability of each hypothesis, where hypothesis is denoted 𝐻# and hypothesis is denoted 𝐻". it can be observed that for smaller values of the distance threshold, which indicate better support from the data to the mathematical model, the relative probability of hypothesis is higher than that of hypothesis . we then made use of this result to explore the mathematical models for both cytokines under hypothesis , in particular we performed parameter calibration. to this end (and as described in material and methods under mathematical models and bayesian inference), we carried out bayesian inference together with the mathematical models (hypothesis ) and the experimental data sets to quantify the reaction rates (see supp. fig. ) and initial molecular concentrations (see table and table ). the bayesian parameter calibration of the two models of cytokine signaling allows one to quantify the observed kinetics of pstat / phosphorylation induced by hypil- and il- in rpe and th- cells (figure c). substantial differences in stat association rates to and dissociation rates from the dimeric complexes were inferred to critically contribute to defining pstat / kinetics. figure d shows the kernel density estimates (kdes) for the posterior distributions of the rate constants and initial concentrations in the models. 𝑘$% & denotes the rate at which stat𝑖 binds to gp and 𝑘$' & denotes the rate at which stat𝑖 binds to il- ra, for 𝑖 ∈ { , }. our results indicate that stat and stat exhibit different binding preferences towards il- ra and gp , respectively. while stat exhibits stronger binding to il- ra than gp (𝑘#' & > 𝑘#% & ), stat exhibits stronger binding to gp than il- ra, (𝑘(%& > 𝑘(' & ) in agreement with previous observations ( ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / il- rα cytoplasmic domain is required for sustained pstat kinetics the bayesian inference carried out with the experimental data and the mathematical models clearly indicated statistically significant differences in the binding rates of stat /stat to gp and il- ra, to account for the different phosphorylation kinetics exhibited by hypil- and il- . thus, we next investigated whether the more sustained stat activation by il- resulted from its specific engagement of il- ra. for that, we used rpe cells, which do not express il- ra (supp. fig. e), to systematically dissect the contribution of the il- ra cytoplasmic domain to the differential pstat activation by il- . il- ra’s intracellular domain is very short and only encodes two tyr susceptible to be phosphorylated in response to il- stimulation, i.e., tyr and ty (figure a). we mutated these two tyr to phe to analyse their contribution to il- induced signaling. we stably expressed wt il- ra as well as different il- ra tyr mutants in rpe cells with comparable cell surface expression levels (figure b). importantly, this reconstituted experimental system mimicked the pstat / activation kinetics of t-cells (supp. fig. a). as the endogenous gp expression levels remain unaltered, all generated clones exhibited very comparable responses to hypil- (figure b, bottom panels). il- triggered comparable levels of stat and stat activation in rpe cells reconstituted with il- ra wt and il- ra y f mutant, suggesting that this tyr residue does not contribute to signaling by this cytokine (figure b and supp. fig. b). in rpe cells reconstituted with the il- ra y f or y f-y f mutants, il- stimulation resulted in % of the stat activation, but only % of the stat activation levels induced by this cytokine relative to il- ra wt (figure b) ( ). these observations suggest a tight coupling of stat phosphorylation to one of the receptor chains; namely, il- ra with pstat and gp with pstat , respectively. we next tested how the cytoplasmic domains of gp and il- ra shape the pstat kinetic profiles. thus, we generated a stable rpe clone expressing a chimeric construct comprised of the extracellular and transmembrane domain of il- ra but the cytoplasmic domain of gp (figure c, supp. fig. a). again, as both cell lines express unaltered endogenous gp levels, they exhibited comparable responses to hyil- (figure c). strikingly, this domain-swap resulted in a transient pstat kinetic response by il- comparable to hypil- stimulation. stat activation on the other hand remained unaltered suggesting that the cytoplasmic domain of il- ra is essential for a sustained pstat response but not for pstat . two plausible scenarios could explain the observed pstat / activation differential by hypil- and il- : i) il- ra-jak complex phosphorylates stat faster than gp -jak complex or ii) pstat is more quickly dephosphorylated in the il- /gp receptor homodimer. in the latter case, pstat deactivation by constitutively expressed phosphatases could be an additional factor of regulation. indeed, shp- has been described to bind to gp and shape il- responses ( ). however, our bayesian inference results (together with the mathematical models and the experimental data) identified the stat/receptor association rates as the only rates that could account for the greater and more sustained activation of stat by il- . we note (as described in the material and methods) that the phosphorylation rate, denoted by q, of stat and stat when bound to a dimer (homo- or hetero-) has been assumed to be independent of the stat type and the receptor chain. moreover, the model also included dephosphorylation of free pstat molecules, and predicted that the rates at which these reactions occur (𝑑# and 𝑑() had rather similar posterior distributions, hence arguing against the potential role of phosphatases to specifically target stat upon hypil- stimulation. to distinguish between the two plausible scenarios, we next .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / determined the rates of pstat / dephosphorylation by blocking jak activity upon cytokine stimulation making use of the jak inhibitor tofacitinib in rpe cells. tofacitinib was added minutes after stimulation with either cytokine and pstat and pstat levels were measured at the indicated times. jak inhibition markedly shortened the pstat / activation profiles induced by both cytokines (figure d, supp. fig. b). the relative dephosphorylation rates could then be determined by the signal intensity ratio of +/- tofacitinib. even though pstat levels were more affected by jak inhibition than those of pstat , the observed relative changes were nearly identical for il- and hypil- . these findings were also confirmed for th- cells (supp. fig. c & d) and indicate, that selective phosphatase activity cannot serve as an explanation for the pstat / differential by hypil- and il- , in agreement with our mathematical modelling predictions. similarly, we tested whether neosynthesis of feedback inhibitors such as socs ( ) would selectively impair signaling by hypil- but not by il- . to this end we pre-treated cells with cycloheximide (chx) and followed the pstat / kinetics induced by the two cytokines (supp. fig. a & b). chx treatment resulted in more sustained pstat activity for both cytokines. to our surprise, stat phosphorylation by il- was even more sustained while pstat levels induced by il- remained unaffected. these observations exclude that feedback inhibitors selectively impair stat activation kinetics by hypil- and thus do not account for the faster stat dephosphorylation kinetics observed under hypil- stimulation. overall our data from the chimera and mutant experiments, which were not used in the bayesian calibration, provide strong and independent support, as well as validation, to the mathematical models of hypil- and il- signaling, and point to the differential association/dissociation of stat and stat to il- ra and gp , respectively, as the main factor defining stat phosphorylation kinetics in response to hypil- and il- stimulation. unique and overlapping effects of il- and hypil- on the th- phosphoproteome thus far, we have investigated the differential activation of stat /stat induced by hypil- and il- . next, we asked whether il- and il- induced the activation of additional and specific intracellular signaling programs that could contribute to their unique biological profiles. to this end, we investigated the il- and hypil- activated signalosome using quantitative mass-spectrometry-based phospho-proteomics. macs-isolated cd + were polarized into th- cells and expanded in vitro for stable isotope labelling by amino acids in cell culture (silac). cells were then stimulated for min with saturating concentrations of il- , hypil- or left untreated. samples were enriched for phosphopeptides (ti-imac), subjected to mass spectrometry and raw files analysed by maxquant software (supp. fig. a). in total we could quantify ~ phosphopeptides from proteins, identified across all conditions (unstimulated, il- , hypil- ) for at least two out of three tested donors. for il- and hypil- we detected similar numbers of significantly upregulated ( vs. ) and downregulated ( vs. ) phosphorylation events (figure a) and systematically categorized them in context with their cellular location and ascribed biological functions (supp. fig. b & c) ( ). the two cytokines shared approximately half of the upregulated and one third of the downregulated phospho-peptides (supp. fig. a) but also exhibited differential target phosphorylation (figure b and supp. fig. b). as expected, we found multiple members of the stat protein family among the top phosphorylation hits by the two cytokines, validating our study (figure b & c). in line with our previous observations, we detected the same relative amplitudes for tyrosine phosphorylated stat and stat . in addition to tyrosine- phosphorylation, we detected robust serine-phosphorylation on s for stat and stat (figure c). while ps-stat activity correlated with py-stat with il- being more potent .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / than hypil- , this was not the case for stat . despite an identical py-stat phosphorylation profile, hypil- induced a ~ % higher ps-stat relative to il- (figure c). these results were corroborated, following the phosphorylation kinetics of ps- stat and ps-stat by flow-cytometry (figure d). given the overlapping phospho-proteomic changes, gene ontology (go) analysis associated several sets of phosphopeptides with biological processes that were mostly shared between both cytokines (figure e, supp. fig. c). a large set of phospho-peptides was linked to transcription initiation (including jak/stat signaling) or mrna modification (figure e). interestingly, il- stimulation was associated to negative regulation of rna polymerase ii, whereas a positive regulation was detected for hypil- . a closer look into the functional regulation of rna-pol ii activity by the two cytokines revealed that multiple proteins involved in this process were differentially regulated by hypil- and il- (figure f). while positive regulators of rna-pol ii transcription, such as negative elongation factor a (nelfa), ppm g, rchy and pol ra, were much more phosphorylated in response to hypil- than il- , negative regulators of rna-pol ii transcription, such as larp , were much more engaged by il- treatment than by hypil- (figure f). interestingly, in a previous study we linked rna-pol ii regulation with the levels of stat s phosphorylation induced by hypil- via recruitment of cdk to stat dependent genes ( ). our phospho-proteomic analysis thus, suggests that il- and hypil- recruit different transcriptional complexes that ultimately could contribute to provide gene expression specificity by the two cytokines. additionally, we identified several interesting il- -specific phosphorylation targets. one example was ubiquitin protein ligase e component n-recognin (ubr ). phosphorylated ubr leads to ubiquitination and subsequent degradation of rorgc ( ), the key transcription factor required for th- lineage commitment, thus limiting th- differentiation (supp. fig. d). a second example is pak , which phosphorylates and stabilizes foxp leading to higher levels of treg cells (supp. fig. d) ( ). moreover, il- stimulation led to a very strong phosphorylation of bcl -associated agonist of cell death (bad), a critical regulator of t-cell survival and a well-known substrate of the pak kinase ( ). overall, our data show a large overlap between the il- and il- signaling program, with a strong focus on jak/stat signaling. however, il- engages additional signaling intermediaries that could contribute to its unique immuno-modulatory activities. further studies will be required to assess how these il- specific signaling pockets contribute to shape il- responses. kinetic decoupling of gene induction programs depends on sustained stat activation and irf expression by il- next, we investigated how the different kinetics of stat activation induced by hypil- and il- ultimately modulated gene expression by these two cytokines. to this end, we performed rna-seq analysis of th- cells stimulated with hypil- or il- for h, h and h to obtain a dynamic perspective of gene regulation. we identified ~ shared genes that could be quantified for all three donors and throughout all tested experimental conditions. in a first step, we compared how similar the gene programs induced by hypil- and il- were. principal component analysis (pca) was run for a subset of genes, found to be significantly up- (total ~ ) or downregulated (total ~ ) by either of the experimental conditions (p value£ . , fold change ³+ or £- ). at one hour of stimulation hypil- and il- induced very similar gene programs, with the two cytokines clustering together in the pca analysis regardless of whether we focused on the subsets of upregulated or downregulated genes (figure a). however, the similarities between the two cytokines changed dramatically in the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / course of continuous stimulation. while the two cytokines induced the downregulation of comparable gene programs at h and h stimulation, as denoted by the close clustering in the pca analysis (figure a, right panel) and the fraction of shared genes (~ %, figure b, supp. fig. a-c, supp. fig. a), this was not observed for upregulated genes. although the two cytokines induced comparable gene upregulation programs after h of stimulation (~ % shared genes), this trend almost completely disappeared at later stimulation times (figure a & b, supp. fig. b). this is well-reflected by the absolute numbers of up- or downregulated genes observed for il- and hypil- (figure c). stimulation with both cytokines yielded a similar trend of gene downregulation (figure c, right panel). however, while hypil- stimulation resulted in a spike of gene upregulation at h that quickly disappeared at later stimulation times, il- stimulation was capable to increase the number of upregulated genes beyond h of stimulation and maintains it even after h (figure c, left panel). this “kinetic decoupling” of gene induction seems to have a striking functional relevance. gene set enrichment analysis (gsea) ( ) identified several reactome pathways to be enriched for il- over the course of stimulation – most of them linked with interferon signaling and immune responses (figure d). in contrast, for hypil- stimulation no pathway enrichment was detected. most importantly, the vast majority of il- -induced genes that were associated to these pathways belonged to genes upregulated by il- treatment and that have been previously linked to stat activation ( , ) (supp. fig. c). although hypil- treatment resulted in the induction of some of these genes, their expression was very transient in time, in agreement with the short stat activation kinetic profile exhibited by hypil- (supp. fig. b & c). next, we performed cluster analysis to find further similarities and discrepancies between the gene expression programs engaged by hypil- and il- (figure e). since genes downregulated by il- and hypil- showed overall good similarity throughout the whole kinetic series, we mainly focused on differences in upregulated gene induction. we identified three functionally relevant gene clusters. the first gene cluster corresponds to genes that are transiently and equally induced by hypil- and il- . these genes peak after one hour and return to basal levels after h and h of stimulation (figure e). interestingly, this cluster contains classical il- -induced and stat -dependent genes, such as members of the nfkb and jun/fos transcriptional complex ( ), as well as the feedback inhibitor suppressor of cytokine signaling (socs ) ( ) and t-cell early activation marker cd . (figure e). a second cluster of genes corresponded to genes that were persistently activated by il- but only transiently by hypil- (figure e). among these genes we found classical stat - dependent genes, such as socs , programmed cell death ligand (pdl = cd ) ( ) and members of the interferon-induced protein with tetratricopeptide repeats (ifit) family. the third cluster of genes corresponded to genes exhibiting strong and sustained activation by il- after h and h stimulation but no activation by hypil- at all. this “ nd wave” of gene induction by il- was almost exclusively comprised of classical interferon stimulated genes (isgs) (supp. fig. c), such as stat & , guanylate binding protein (gbp ), gbp , & , and irf & . it is worth mentioning, that genes in the third cluster appear to require persistent stat activation ( , ) and were the basis for the ifn signature identified in our reactome pathway analysis. still, we were surprised about the magnitude of this nd gene wave. even though il- exerts a sustained pstat kinetic profile, pstat levels were down to ~ % of maximal amplitude after h of stimulation. we reasoned that additional factors could further amplify the stat response for il- but not for hypil- . within the st wave of stat -dependent genes, .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / we also spotted the transcription factor interferon response factor (irf ), that was continuously induced throughout the kinetic series in response to il- but only transiently spiking after h of hypil- stimulation (figure e). irf expression was shown to prolong pstat kinetics ( ) and to be required for il- -dependent tr- differentiation and function ( ). we confirmed the kinetics of irf protein expression by flow cytometry and showed higher and more sustained protein levels after il- stimulation relative to hypil- (figure a). next, we tested in our rpe cell system, whether sirna mediated knockdown of irf would alter the gene induction profiles of certain stat or stat -dependent marker genes. in rpe cells, reconstituted with il- ra, irf protein levels were peaking around h after stimulation with il- and transfection with irf -targeting sirna knocked down expression by > % (figure b). importantly, knockdown of irf did not alter the overall kinetics of pstat and pstat activation (figure c). induction of stat -dependent genes stat , gbp and oas as well as stat -dependent gene socs were followed by rt qpcr (figure d). interestingly, up to h of stimulation, the gene induction curves were identical for control- and irf -sirna treated cells. later than h – that is, when irf protein levels are peaking – the gene induction was decreased between - % in absence of irf . strikingly, expression of socs , a classical stat -dependent reporter gene was transient and independent on irf levels, highlighting that irf selectively amplifies stat -dependent gene induction. taken together our data support a scenario whereby il- by exhibiting a kinetic decoupling of stat and stat activation is capable of triggering independent gene expression waves, which ultimately contribute to shape its distinct biology. il- -induced stat response drives global proteomic changes in th- cells next, we aimed to uncover how the distinct gene expression programs engaged by hypil- and il- ultimately relate to alterations of the th- cell proteome. for that, we continuously stimulated silac labelled th- cells for h with saturating doses of il- and hypil- and compared quantitative proteomic changes to unstimulated controls (figure a). we quantified ~ proteins present in all three biological replicates and in all tested conditions (unstimulated/il- /hypil- ). both cytokines downregulated a similar number of proteins (il- : , hypil- : ) (figure b) with approximately half of them being shared by the two cytokines, mimicking our observations in the rna-seq studies (figure c, supp. fig. a). with upregulated proteins, il- was almost twice as potent as hypil- ( proteins) with very little overlap. among the upregulated proteins by il- but not hypil- , we detected several proteins with described immune-modulatory functions on t-cells. one of these proteins was transforming growth factor b (tgf-b), which is a key regulator with pleiotropic functions on t-cells ( ). tgf-b has been identified to synergistically act with il- to induce il- secretion from tr- cells – thus accounting for one of the key anti-inflammatory functions of il- ( ). on the other hand, we also found selplg-encoded protein rsgl- which is critically required for efficient migration and adhesion of th- cells to inflamed intestines ( , ). interestingly, we found larp moderately upregulated by il- . this negative regulator for rna pol ii was also identified in our phospho-target screening and selectively engaged by il- (figure f). il- and hypil- share ~ % of downregulated proteins, but without strong functional patterns. both cytokines downregulated several proteins related to mitotic cell cycle (lig , csnk b, psmb ) mrna processing and splicing (ncbp , pcbp , nudt ) ( ). strikingly, a significant number (~ %) of proteins upregulated by il- belong to the group of isgs (figure b & c, supp. fig. b). this particular set of proteins including stat , .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / stat , mx dynamin like gtpase (mx ), interferon stimulated gene (isg ) or poly(adp-ribose) polymerase family member (parp ) was not markedly altered by hypil- . of note: the overall expression patterns of the most significantly altered proteins are congruent to the gene induction patterns observed after h and h (figure d & e, supp. fig. b). similar to this, gsea reactome analysis identified again pathways associated with interferon signaling and cytokine/immune system but failed to detect any significant functional enrichment by hypil- (figure e, supp. fig. b & c). finally, we correlated rnaseq-based gene induction patterns with detected proteomic changes. to our surprise we only found a relatively low number of shared hits. however, the identified proteins belong exclusively to a group upregulated by il- (figure f). they are all located in the “ nd gene wave” cluster and all of them are regulated by isgs (figure e). taken together these results provide compelling evidence that sustained pstat activation by il- accounts for its gene induction and proteomic profiles, thus, giving a mechanistic explanation for the diverse biological outcomes of il- and il- . our observations are in good agreement with previous findings in cancer cells, showing that particularly the involvement of stat activation is responsible for proteomic remodeling by il- ( ). receptor and stat concentrations determine the nature of the il- /il- response our data suggest that stat molecules compete for binding to a limited number of phospho- tyr motifs in the intracellular domains of cytokine receptors. a direct consequence derived from this hypothesis is that cells can adjust and change their responses to cytokines by altering their concentrations of specific stats or receptors molecules. to assess to what degree immune cells differ in their expression of cytokine receptors and stats, we investigated levels of il- ra, gp , il- ra, stat and stat protein expression across different immune cell populations making use of the immunological proteomic resource (immpres - http://immpres.co.uk) database. strikingly, the level of expression of these proteins change dramatically across the populations studied (figure a), suggesting that these cells could potentially produce very different responses to hypil- and il- stimulation. in order to quantify (and predict) how changes in expression levels of different proteins modify the kinetics of pstat, we made use of the two mathematical models of hypil- and il- stimulation and the parameters inferred with bayesian methods. our mathematical models could accurately reproduce the experimental results generated across our study, i.e., signaling by the il- ra chimeric and il- ra-y f mutant receptors and dose/response studies (supp. fig. a-c), making use of the posterior parameter distributions generated from the bayesian parameter calibration. having developed mathematical models which are able to accurately explain the experimental data (supp. fig. b and c) and reproduce independent experiments (fig. b and c), we then sought to use the models to predict pstat signaling kinetics under different concentration regimes of receptors and stats. to simplify the simulations, we focused our analysis in gp and stat proteins, two of the proteins that greatly vary in the different immune populations (figure a). as baseline values for the concentrations [𝐺𝑃 ( )], [𝐼𝐿 𝑅𝑎( )] [𝑆𝑇𝐴𝑇 ( )] and [𝑆𝑇𝐴𝑇 ( )] we used approximately the median values from the posterior distributions for each parameter: [𝐺𝑃 ( )] = nm, [𝐼𝐿 𝑅𝑎( )] = nm and [𝑆𝑇𝐴𝑇 ( )] = [𝑆𝑇𝐴𝑇 ( )] = nm. to see the effect of varying gp concentrations on pstat signaling, we decreased the initial concentration of gp and simulated the model using the accepted parameters sets from the abc-smc to inform the other parameter values. a tenfold reduction on gp concentration ([𝐺𝑃 ( )] = . 𝑛𝑀) resulted in a striking loss in pstat levels induced by hypil- , with very little effect .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / on pstat levels induced by this cytokine (figure b). pstat / kinetics induced by il- however was not affected by this decrease in gp concentration (figure b). interestingly, the hypil- signaling profile predicted by our model at low gp concentrations strongly resemble the one induced by hypil- in th- cells (figure c), where very low levels of gp are found, further confirming the robustness of the predictions generated by our mathematical models. when the concentration of stat was increased by a factor of ten ([𝑆𝑇𝐴𝑇 ( )] = nm, both hypil- and il- induced significantly higher levels of pstat activation (figure b). pstat levels were not affected for hypil- stimulation but were decreased for il- stimulation (figure b), further indicating the competitive nature of the binding of stat and stat to il- ra and gp . overall, our mathematical model predicts that changes on gp and stat expression produce a substantial remodeling of the hypil- and il- signalosome, which ultimately could lead to aberrant responses. stat protein levels in sle patients modify hypil- and il- signaling responses stat is a classical ifn responsive gene and stat levels are highly increased in environments rich in ifns ( ). thus, we next ask whether stat levels would be increased in sle patients, an examples of disease where ifns have been shown to correlate with a poor prognosis, making use of available gene expression datasets ( ). we did not find differences in the expression of gp , il- ra or il- ra in sle patients (figure c). however, we detected a significant increase in the levels of stat and stat transcripts in these patients when compared to healthy controls, with the increase on stat expression being significantly more pronounced (figure c). since our mathematical model predicted that increases in stat expression could significantly change cytokine-induced cellular responses by hypil- and il- , we next experimentally tested this prediction. for that, we primed th- cells with ifna overnight to increase total stat levels (and to a lower extent stat ) in these cells (supp. fig. a). while both hypil- and il- induced comparable levels of pstat in primed and non-primed th- cells, levels of pstat induced by the two cytokines were significantly upregulated in primed th- cells, resulting in a bias stat response and confirming our model predictions (figure d). we next investigated whether this bias stat activation by hypil- and il- observed in ifna -primed th- cells was also present in sle patients. for that we collected pbmcs from six sle patients or five age-matched healthy controls and measured stat and stat expression, as well as pstat and pstat induction by hyil- and il- after min treatments in cd t cells. importantly, comparable results to those obtained with ifn-primed th- cells were obtained, with signaling bias towards pstat in cd + t cells from sle patients stimulated with hypil- and il- (figure e, supp. fig. b & c), further supporting the fact that stat concentrations play a critical role in defining cytokine responses in autoimmune disorders. our data show that stat and stat compete for phospho-tyr motifs in gp , with stat having an advantage resulting from its tighter affinity to gp . finally, we asked whether crippling jak activity by using sub-saturating doses of jak inhibitors could differentially affect stat and stat activation by hypil- and therefore rescue the altered cytokine responses found in sle patients. to test this, rpe and th- cells were stimulated with saturated concentrations of hypil- and titrating the concentrations of tofacitinib, a clinically approved jak inhibitor. strikingly, tofacitinib inhibited hypil- induced pstat more efficiently than pstat in both rpe cells and th- cells (figure f). at nm concentration, tofacitinib inhibited pstat levels induced by hypil- by %, while only inhibited pstat levels by % (figure f) – an effect that we did not observe for il- stimulation (supp. fig. d). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / overall, our results show that the changes in stats concentration found in autoimmune disorders shape cytokine signaling responses and could contribute to disease progression. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / discussion: cytokine pleiotropy is the ability of a cytokine to exert a wide range of biological responses in different cell types. this functional pleiotropy has made the study of cytokine biology extremely challenging given the strong cross-talk and shared usage of key components of their signaling pathways, leading to a high degree of signaling plasticity, yet still allowing functional selectivity ( , ). here we aimed to identify the underlying determinants that define cytokine functional selectivity by comparing il- and il- at multiple scales – ranging from cell surface receptors to proteomic changes. we show that il- triggers a more sustained stat phosphorylation than il- , via a high affinity stat /il- ra interaction centered around tyr on il- ra. this in turn results in a more sustained irf expression induced by il- , which leads to the upregulation of a second wave of gene expression unique to il- and comprised of classical isgs. we go one step further and show that this strong receptor/stat coupling is altered in autoimmune disorders where stats concentrations are often dysregulated. increased expression of stat in sle patients biases hypil- and il- responses towards stat activation, further contributing to the worsening of the disease. by using suboptimal doses of the jak inhibitor tofacitinib we show that specific stat proteins engaged by a given cytokine can be targeted. overall, our study highlights a new layer of cytokine signaling regulation, whereby stat affinity to specific cytokine receptor phospho-tyr motifs controls stat phosphorylation kinetics and the identity of the gene expression program engaged, ultimately ensuing the generation of functional diversity through the use of a limited set of signaling intermediaries. the tight coupling of one receptor subunit to one particular stat that we have identified in our study is a rather unusual phenomenon for heterodimeric cytokine receptor complexes, which has been first suggested by owaki et al. ( ). generally, the entire signaling output driven by a cytokine-receptor complex emanates from a dominant receptor subunit, which carries several tyr residues susceptible of being phosphorylated ( , ). this in turn results in competition between different stats for binding to shared phospho-tyr motifs in the dominant receptor chain, leading to different kinetics of stat phosphorylation as observed for il- stimulation ( ) (figure b). moreover, this localized signaling quantum allows phosphatases and feedback regulators – induced upon cytokine stimulation – to act in synergy to reset the system to its basal state, generating a very synchronous and coordinated signaling wave. although very effective, this molecular paradigm presents its limitations. stat competition for the same pool of phospho-tyr makes the system very sensitive to changes in stat concentration. ifng primed cells, which exhibit increased stat levels, trigger an ifng- like stat response upon il- stimulation ( ). il- anti-inflammatory properties are lost in cells with high levels of stat expression, as a result of a pro-inflammatory environment rich in ifns ( ). indeed, we show that stat transcripts levels are increased in crohn’s disease and sle patients and they contributed to alter il- responses. strikingly, il- appears to have evolved away from this general model of cytokine signaling activation. our results show that stat activation by il- is tightly coupled to il- ra, while stat activation by this cytokine mostly depends on gp . this decoupled stat and stat activation by il- is possible thanks to the presence of a putative high affinity stat binding site on il- ra that resembles the one present in ifngr ( ). as a result of this, il- can trigger sustained and independent phosphorylation of both stat and stat . this unique feature of il- allows it to induce robust responses in dynamic immune environments. indeed, our mathematical models of cytokine signaling and bayesian inference, together with the experimental observations show that changes in receptor concentration minimally affected .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pstat / induced by il- , while they fundamentally alter il- responses. overall, our data show that cytokine responses are versatile and adapt to the continuously changing cell proteome, highlighting the need to measure cytokine receptors and stats expression levels, in addition to cytokine levels, in disease environments to better understand and predict altered responses elicited by dysregulated cytokines. in recent years, it has become apparent that the stability of the cytokine-receptor complex influences signaling identity by cytokines ( ). short-lived complexes activate less efficiently those stat molecules that bind with low affinity phospho-tyr motif in a given cytokine receptor ( ). our current results further support this kinetic discrimination mechanism for stat activation. our statistical inference identified differences in stat recognition to the cytokine receptor phospho-tyr motifs as one of the major determinants of stat phosphorylation kinetics. this parameter alone was sufficient to explain transient and sustained stat phosphorylation induced by il- and il- , respectively, without the need to invoke the action of phosphatases or negative feedback regulators such as socss. indeed, our results indicate that the rate of stat dephosphorylation is similar between the il- and il- systems, suggesting that phosphatases do not contribute to these early kinetic differences. moreover, blocking protein translation, and therefore the upregulation of negative feedback regulators by il- treatment did not result in a more sustained stat phosphorylation by il- , again indicating that the transient kinetics of stat phosphorylation by il- is encoded at the receptor level and does not require further regulation. however, recent reports have found that the amplitude of stat phosphorylation in response to il- is regulated by levels of ptpn expression, suggesting that phosphatases can play additional roles in shaping il- responses beyond controlling the kinetics of stat activation ( ). stat phosphorylation levels by il- on the other hand were significantly more sustained in the absence of protein translation, suggesting that negative feedback mechanisms are required to downmodulate signaling emanating from high affinity stat-receptor interactions. overall our results suggest that while phosphatases and negative feedback regulators play an important role in maintaining cytokine signaling homeostasis ( ), the kinetics of stat activation appears to be already encoded at the level of receptor engagement, thus ensuring maximal efficiency and signal robustness. cytokine signaling plasticity can occur at the level of receptor activation. in the past years, a scenario has emerged suggesting that the absolute number of signaling active receptor complexes is a critical determinant for signal output integration. accordingly, specific biological responses were shown to be tuned either by abundance of cell surface receptors ( , ) or by the level of receptor assembly ( , , ). here, we show for the first time that il- - induced dimerization of il- ra and gp at the cell surface of live cells – in good agreement with previous studies on heterodimeric cytokine receptor systems ( , ). for il- , the receptor subunits il- ra and gp can be expressed at different ratios as seen for naïve vs. activated t-cells ( ) as well as intestinal cells ( ). on t-cells, particularly after activation, il- ra is expressed in strong excess over gp , rendering gp as the limiting factor for receptor complex assembly ( ). interestingly, we observe that in addition to a faster kinetic of stat phosphorylation, hypil- treatment induces a lower maximal amplitude in pstat activation in t cells. this is in stark contrast to our results in rpe cells, where high abundance of gp (~ - copies of cell surface gp ) is found. in these cells both cytokines elicited similar amplitudes of stat phosphorylation. our results suggest that surface receptor density in synergy with stats binding dynamics to phospho-tyr motif .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / on cytokine receptors act to define the amplitude and kinetics of stat activation in response to cytokine stimulation. the distinct stat and stat kinetic profiles induced by il- and il- are the prerequisite for time-correlated decoupling of genetic programs: a “shared gp /stat -dependent wave” and an il- -“unique il- ra/stat -dependent wave”. however, pstat levels induced by il- at h were down to ~ % of maximal amplitude, suggesting that additional factors would be required to amplify the initial stat response elicited by il- . we observed that il- induces the expression of an early wave of classical stat -dependent genes, which is also shared by il- . however, while il- induces the upregulation of these genes throughout the entire duration of the experiment, il- only resulted in a transient spike. we reasoned that this additional factor required for il- signal amplification would be among these early stat -dependent genes. among this set of genes we found the transcription factor irf , which had been shown to act as a feedback amplificant for pstat activity ( ). importantly, irf protein levels have been shown to be upregulated in response to il- and ifng but not to il- stimulation in hepatocytes ( ). irf plays a key role in chromatin accessibility which is critically required for il- -induced differentiation of tr cells and subsequent il- secretion ( ). here, we could prove that the contribution of irf on stat - but not stat -dependent genes is a generic feature of il- signaling. this readily explains the significant transcriptomic overlap of il- with type i ( ) or type ii interferons ( ) after long-term stimulation with these cytokines. along this line, it is not surprising that il- – beyond its well-described effects on t-cell development – can also mount a considerable antiviral response as shown in hepatic cells and pbmcs ( , ). our results suggest that by modulating the kinetics of stat phosphorylation, cytokines can modulate the expression of accessory transcription factors, such as irf , that act in synergy with stats to fine-tune gene expression and provide functional diversity. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / acknowledgments we thank members of the moraga, molina-parís, piehler and mitra laboratories for helpful advice and discussion. we thank g. hikade and h. kenneweg for technical support, c. p. richter for providing software for single-molecule image analysis, r. kurre (integrated bioimaging facility osnabrück) for support with fluorescence microscopy and the fingerprints proteomics facility (dundee) for support with the mass spectrometry data. this work was supported by the stg, ls , wellcome-trust- /z/ /z (im ep), erc- -stg grant (im jmf ep pkf), embo (sw – ), dfg (sfb , p /z, jp), national heart, lung and blood institute (k hl , mk) and contrat de plan etat région hauts de france and institut pour la recherche sur le cancer de lille (sm sg). cmp and gl were supported by h , quantii. pj is supported by the epsrc, astrazeneca and smith institute (smith institute case studentship, award reference ). numerical work was undertaken on arc , which is part of the high performance computing facilities at the university of leeds, uk. competing interests the authors declare that they have no competing interests. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / material and methods protein expression and purification: murine il- was cloned as a linker-connected single-chain variant (p +ebi ) as described in ( ). human hyperil- (hypil- ), and murine single-chain il- were cloned into the pacgp -a vector (bd biosciences) in frame with an n-terminal gp signal sequence and a c-terminal hexahistidine tag, and produced using the baculovirus expression system, as described in ( ). baculovirus stocks were prepared by transfection and amplification in spodoptera frugiperda (sf ) cells grown in sf ii media (invitrogen) and protein expression was carried out in suspension trichoplusiani ni (high five) cells grown in insectxpress media (lonza). purification was performed using the method described in ( ). for il- , the cells were pelleted with centrifugation at rpm, prior to a precipitation step through addition of tris ph . , cacl and nicl to final concentrations of mm, mm and mm respectively. the precipitate formed was then removed through centrifugation at rpm. nickel-nta agarose beads (qiagen) were added and the target proteins purified through batch binding followed by column washing in hbs-hi buffer (hbs buffer supplemented to mm nacl and % glycerol, ph . ). elution was performed using hbs-hi buffer plus mm imidazole. final purification was performed by size exclusion chromatography on an enrich sec column (biorad), again equilibrated in hbs-hi. concentration of the purified sample was carried out using kda millipore amicon-ultra spin concentrators. for hypil- , proteins were purified likewise, but in mm hepes (ph . ) containing mm nacl. recombinant cytokines were purified to greater than % homogeneity. for cell surface labeling, the anti-gfp nanobody (nb) “enhancer” and “minimizer” were used, which bind megfp with subnanomolar binding affinity ( ). nb was cloned into pet- a with an additional cysteine at the c-terminus for site-specific fluorophore conjugation in a : fluorophore:nanobody stoichiometry. furthermore, (pas) sequence to increase protein stability and a his-tag for purification were fused at the c-terminus. protein expression in e. coli rosetta (de ) and purification by immobilized metal ion affinity chromatography was carried out by standard protocols. purified protein was dialyzed against hepes ph . and reacted with a two-fold molar excess of dy maleimide (dyomics), atto maleimide (at ) and atto rho maleimide (rho ) (atto-tec gmbh), respectively. after h, a -fold molar excess (with respect to the maleimide) of cysteine was added to quench excess dye. protein aggregates and free dye were subsequently removed by size exclusion chromatography (sec). a labeling degree of . - : fluorophore:protein was achieved as determined by uv/vis spectrophotometry. cd + t cell purification and th- differentiation: human buffy coats were obtained from the scottish blood transfusion service and peripheral blood mononuclear cells (pbmcs) of healthy donors were isolated from buffy coat samples by density gradient centrifugation according to manufacturer’s protocols (lymphoprep, stemcell technologies). from each donor, x pbmcs were used for isolation of cd + t-cells. cells were decorated with anti-cd fitc antibodies (biolegend, # ) and isolated by magnetic separation according to manufacturer’s protocols (macs miltenyi) to a purity > % cd +. freshly isolated resting cd + t cells ( x per donor) were activated under th- polarizing conditions using immunocult™ human cd /cd t cell activator (stemcell, cat# ) following manufacturer instructions for days in rpmi- , % v/v .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / fbs, u/ml penicillin-streptomycin (gibco) in the presence of the cytokines il- (novartis, # , ng/ml), anti-il- antibody ( ng/ml, bd biosciences, # ), il- ( ng/ml, biolegend, # ). after three days of priming, cells were expanded for another days in the presence of il- ( ng/ml). human sle patient samples: this study was authorized by the french competent authority dealing with research on human biological samples namely the french ministry of research. the authorization number is ech / . to issue such authorization, the ministry of research has sought the advice of an independent ethics committee, namely the “comité de protection des personnes,” which voted positively, and all patients gave their written informed consent. the healthy volunteer was recruited to serve as healthy control individuals. healthy and patients’ blood samples were collected in heparinized tubes (bd vacutainer , bd biosciences san jose, ca, usa) and pbmc samples were isolated using ficoll (pancoll, pan biotech #p - ) density gradient centrifugation. the isolated pbmcs were washed with pbs and the remaining red blood cells were lysed using rbc lysis buffer (ack lysing buffer, gibco #a - ), incubate min at room temperature. cells were washed in pbs and resuspend the cells with ml of freezing medium (with dmso, pan biotech, #p - ) and transfer the cells in a cryotube. cryotube in a freezing container (nalgene) and at - °c and then transferred into liquid nitrogen container for long term storage. classification and demographic information about sle patients and healthy controls: sle patients were included if they fulfilled the american college of rheumatology (acr) classification criteria (hochberg mc. updating the american college of rheumatology revised criteria for the classification of systemic lupus erythematosus ( ). exclusion criteria were current intake of mg or more of prednisone or equivalent and/or use of immunosupressants within the previous months before inclusion. use of hydroxychloroquine was not an exclusion criterion. patients were mostly in clinical remission, half with biological remission, half with persistent anti native dna autoantibodies. all sle patients and healthy controls were females between and years old. (phospho-) proteomics: for (phospho-) proteomic experiments, th- cells from each donor were split into three different conditions after initial expansion: light silac media ( mg/ml l-lysine k (sigma, #l ) and mg/ml l-arginine r (sigma, #a )), medium silac media ( mg/ml l- lysine u- c k (ckgas, #clm- - . ) and mg/ml l-arginine u- c r (ckgas, #clm- - . )) and heavy silac media ( . mg/ml l-lysine u- c ,u- n k (ckgas, #cnlm- -h- . ) and . mg/ml l-arginine u- c ,u- n r (ckgas, #cnlm- -h- . )) prepared in rpmi silac media (thermo scientific, # ) supplemented with % dialyzed fbs (hyclone, #sh . ), ml l-glutamine (invitrogen, # ), ml pen/strep (invitrogen, # ), ml mem vitamin solution (thermo scientific, # ), ml selenium-transferrin-insulin (thermo scientific, # ) and expanded in the presence of ng/ml il- and ng/ml anti-il for another days in order to achieve complete labelling. media was exchanged every two days. incorporation of medium and heavy version of lysine and arginine was checked by mass spectrometry and samples with an incorporation greater than % were used. after expansion, cells were starved without il- for hours before stimulation with nm il- or nm hyil- for minutes (phosphoproteomics) or h (global proteomic changes). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / cells were then washed three times in ice-cold pbs, mix in a : : ratio, resuspended in sds- containing lysis buffer ( % sds in mm triethylammonium bicarbonate buffer (teab)) and incubated on ice for min to ensure cell lysis. then, cell lysates were centrifuged at g for minutes at + °c and supernatant was transferred to a clean tube. protein concentration was determined by using bca protein assay kit (thermo, # ), and mg of protein per experiment were reduced with mm dithiothreitol (dtt, sigma, #d ) for h at °c and alkylated with mm iodoacetamide (iaa, sigma, #i ) for min at rt. protein was then precipitated using six volumes of chilled (- °c) acetone overnight. after precipitation, protein pellet was resuspended in ml of mm teab and digested with trypsin ( : w/w, thermo, # ) and digested overnight at .c. then, samples were cleared by centrifugation at g for min at + °c, and peptide concentration was quantified with quantitative colorimetric peptide assay (thermo, # ). phosphopeptide enrichment in the peptide fractions generated as described above was carried out using magresyn ti-imac following manufacturer instructions ( bscientific, mrtim ). high ph reverse phase fractionation for phosphoproteomics: samples were dissolved in μl of mm ammonium formate buffer ph . and peptides are fractionated using high ph rp chromatography. a c column from waters (xbridge peptide beh, Å, . µm . x mm, ireland) with a guard column (xbridge, c , . µm, . x mm, waters) are used on a ultimate hplc (thermo-scientific). buffers a and b used for fractionation consist, respectively of mm ammonium formate in milliq water (buffer a) and mm ammonium formate in % acetonitrile (buffer b), both buffers were adjusted to ph . with ammonia. fractions are collected using a wps- fc autosampler (thermo-scientific) at min intervals. column and guard column were equilibrated with % buffer b for min at a constant flow rate of . ml/min and a constant temperature f oc. samples ( µl) are loaded onto the column at . ml/min, and separation gradient started from % buffer b, to % b in min, then from % b to % b within min and finaly from % b to % b in min. the column is washed for min at % buffer b and equilibrated at % buffer b for min as mentioned above. the fraction collection started min after injection and stopped after min (total of fractions, µl each). each peptide fraction was acidified immediately after elution from the column by adding to µl % formic acid to each tube in the autosampler. the total number of fractions concatenated was set to . the content of fractions from each set was dried prior to further analysis. lc-ms/ms analysis: lc-ms analysis was done at the fingerprints proteomics facility (university of dundee). analysis of peptide readout was performed on a q exactive™ plus, mass spectrometer (thermo scientific) coupled with a dionex ultimate rs (thermo scientific). lc buffers used are the following: buffer a ( . % formic acid in milli-q water (v/v)) and buffer b ( % acetonitrile and . % formic acid in milli-q water (v/v). dried fractions were resuspended in µl, % formic acid and aliquots of μl of each fraction were loaded at μl/min onto a trap column ( μm × cm, pepmap nanoviper c column, μm, Å, thermo scientific) equilibrated in . % tfa. the trap column was washed for min at the same flow rate with . % tfa and then switched in-line with a thermo scientific, resolving c column ( μm × cm, pepmap rslc c column, μm, Å). the peptides were eluted from the column .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / at a constant flow rate of nl/min with a linear gradient from % buffer b to % buffer b in min then from % buffer b to % buffer b in min, and finally from % buffer b to % buffer b in min. the column was then washed with % buffer b for min and re- equilibrated in % buffer b for min. the column was kept at a constant temperature of oc. q-exactive plus was operated in data dependent positive ionization mode. the source voltage was set to . kv and the capillary temperature was oc. a scan cycle comprised ms scan (m/z range from - , ion injection time of ms, resolution and automatic gain control (agc) x ) acquired in profile mode, followed by sequential dependent ms scans (resolution ) of the most intense ions fulfilling predefined selection criteria (agc x , maximum ion injection time ms, isolation window of . m/z, fixed first mass of m/z, spectrum data type: centroid, intensity threshold x , exclusion of unassigned, singly and > charged precursors, peptide match preferred, exclude isotopes on, dynamic exclusion time s). the hcd collision energy was set to % of the normalized collision energy. mass accuracy is checked before the start of samples analysis. mass spectrometry data analysis: q exactive plus mass spectrometer .raw files were analyzed, and peptides and proteins quantified using maxquant ( ), using the built-in search engine andromeda ( ). all settings were set as default, except for the minimal peptide length of , and andromeda search engine was configured for the uniprot homo sapiens protein database (release date: _ ). peptide and protein ratios only quantified in at least two out of the three replicates were considered, and the p-values were determined by student’s t test and corrected for multiple testing using the benjamini–hochberg procedure (benjamini and hochberg, ). plasmid constructs: for single molecule fluorescence microscopy, monomeric non-fluorescent (y f) variant of egfp was n-terminally fused to gp . this tag (mxfpm) was engineered to specifically bind anti-gfp nanobody “minimizer” (agfp-minb). this construct was inserted into a modified version of psems- m (covalys) using a signal peptide of igk. the orf was linked to a neomycin resistance cassette via an ires site. a mxfpe-il- ra construct was designed likewise but is recognized by agfp nanobody “enhancer” (mxfpe). the chimeric construct mxfp-il- ra (ecd & tmd)-gp (icd) was a fusion construct of il- ra (aa - ) and gp (aa - ). cell lines and media: hela cells were grown in dmem containing % v/v fbs, penicillin-streptomycin, and l- glutamine ( mm). rpe cells were grown in dmem/f containing % v/v fbs, penicillin- streptomycin, and l-glutamine ( mm). rpe cells were stably transfected by mxfpe-il- ra, mutants and the chimeric construct by pei method according to standard protocols. using g selection ( . mg/ml) individual clones were selected, proliferated and characterized. for comparing receptor cell surface expression levels of stable clones expressing variants of il- ra, cells were detached using pbs+ mm edta, spun down ( g, min) and incubated with “enhancer” agfp-ennbdy ( nm, min on ice). after incubation, cells were washed with pbs and run on cytometer. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / flow cytometry staining and antibodies: for measuring dose-response curves of stat / phosphorylation (either th- cells or rpe clones), -well plated were prepared with µl of cell suspensions at x cells/ml/well for th- and x cells/ml/well for rpe . the latter were detached using accutase (sigma). cells were stimulated with a set of different concentrations to obtain dose-response curves. to this end cells were stimulated for min at °c with the respective cytokines followed by pfa fixation ( %) for min at rt. for kinetic experiments, cell suspensions were stimulated with a defined, saturating concentration of cytokines ( nm il- , nm hypil- , nm wt-il- ) in a reverse order so that all cell suspensions were pfa-fixed ( %) simultaneously. for pstat / kinetic experiments at jak inhibition, tofacitinib ( μm, stratech, #s -sel) was added after min of stimulation and cells were pfa-fixed in correct order. after fixation ( min at rt), cells were spun down at g for min at °c. cell pellets were resuspended and permeabilized in ice-cold methanol and kept for min on ice. after permeabilization cells were fluorescently barcoded according to ( ). in brief: using two nhs- dyes (pacificblue, # , dylight , # , thermo scientific), individual wells were stained with a combination of different concentrations of these dyes. after barcoding, cells are pooled and stained with anti-pstat alexa (cell signaling technologies, # ) and anti- pstat alexa (biolegend, # ) at a : dilution in pbs+ . %bsa for h at rt. t-cells were also stained with anti-cd alexaflour ( : , biolegend, # ), anti-cd pe ( : , biolegend, # ), anti-cd brilliantviolet ( : , biolegend, # ). cells were analzyed at the flow cytometer (beckman coulter, cytoflex s) and individual cell populations were identified by their barcoding pattern. mean fluorescence intensity (mfi) of pstat and pstat was measured for all individual cell populations. for measuring total stat levels, methanol-permeabilized cells were stained with anti- stat alexa ( : , biolegend, # ) or anti-stat apc ( : , biolegend, # ). total irf levels methanol-permeabilized cells were stained with anti-irf alexa ( : , biolegend, # ). for measuring cell surface levels of gp , cells were detached with accutase (sigma) and stained with anti-gp apc ( : , biolegend, # ) for h on ice. rna transcriptome sequencing: human th- cells from three donors each (stemcell technologies) were cultivated and stimulated as described in above. cells were washed in hank’s balanced salt solution (hbss, gibco) and snap frozen for storage. rna was isolated using the rneasy kit (quiagen) according to manufacturer’s protocol. all rna / ratios were above . . of each sample, μg of rna was used. transcriptomic analysis was done by novogene as follows. sequencing libraries were generated using nebnext® ultratm rnalibrary prep kit for illumina® (neb, usa) following manufacturer’s recommendations and index codes were added to attribute sequences to each sample. briefly, mrna was purified from total rna using poly-t oligo-attached magnetic beads. fragmentation was carried out using divalent cations under elevated temperature in nebnext first strandsynthesis reaction buffer ( x). first strand cdna was synthesized using random hexamer primer and m-mulv reverse transcriptase (rnase h-). second strand cdna synthesis was subsequently performed using dna polymerase i and rnase h. remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. after adenylation of ’ ends of dna fragments, nebnext adaptor with hairpin loop structure were ligated to prepare for hybridization. in order to select .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / cdna fragments of preferentially ~ bp in length, the library fragments were purified with ampure xp system (beckman coulter, beverly, usa). then μl user enzyme (neb, usa) was used with size-selected, adaptor-ligated cdna at °c for min followed by min at °c before pcr. then pcr was performed with phusion high-fidelity dna polymerase, universal pcr primers and index (x) primer. at last, pcr products were purified (ampure xp system) and library quality was assessed on the agilent bioanalyzer system. rna sequencing data analysis: primary data analysis for quality control, mapping to reference genome and quantification was conducted by novogene as outlined below. quality control: raw data (raw reads) of fastq format were firstly processed through in- house scripts. in this step, clean data (clean reads) were obtained by removing reads containing adapter and poly-n sequences and reads with low quality from raw data. at the same time, q , q and gc content of the clean data were calculated. all the downstream analyses were based on the clean data with high quality. mapping to reference genome: reference genome and gene model annotation files were downloaded from genome website browser (ncbi/ucsc/ensembl) directly. paired-end clean reads were mapped to the reference genome using hisat software. hisat uses a large set of small gfm indexes that collectively cover the whole genome. these small indexes (called local indexes), combined with several alignment strategies, enable rapid and accurate alignment of sequencing reads. quantification: htseq was used to count the read numbers mapped of each gene, including known and novel genes. and then rpkm of each gene was calculated based on the length of the gene and reads count mapped to this gene. rpkm, (reads per kilobase of exon model per million mapped reads), considers the effect of sequencing depth and gene length for the reads count at the same time and is currently the most commonly used method for estimating gene expression levels. for each identified gene, the fold change was calculated by the ratio of cytokine stimulated/unstimulated expression levels within each donor and an unpaired, two-tailed t test was applied to calculate p values. genes were considered to be significantly altered if: p value £ . , and log fold change ³+ or £- . genes with an rpkm of less than in two or more donors were excluded from analysis so as to remove genes with abundance near detection limit. genes without annotated function were also removed. functional annotation of genes (kegg pathways, go terms) was done using david bioinformatics resource functional annotation tool ( , ). clustered heatmap was generated using r studio pheatmap package. sirna-mediated knockdown of irf in rpe cells: a set of four irf -sirnas were purchased from dharmacon and tested individually to determine levels of knockdown achieved. the sirna providing the highest level of irf . knockdown (horizon, lq- - - , sirna # : ugaacucccugccagauau) were subsequently used in all the experiments. rpe -il ra cells were plated in -well dishes ( . x cells per well) and transfected the next day with irf -sirna or control-gapdh sirna (horizon, d- - - ) (dharmacon) using dharmafect transfection reagent (dharmacon) following the manufacturer’s instructions for h. at different timepoints of il- ( nm) or hypil- ( nm) stimulation, samples were collected from each one -well. cells were .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / trypsinized and each sample was spun down and pellets snap-frozen in liquid nitrogen for subsequent rna isolation ( %) or pfa-fixed for total irf staining ( %) by flow cytometry. real-time quantitative pcr: cells were subject to rna isolation using the qiagen rneasy kit. rna ( ng) was reverse transcribed to complementary dna (cdna) using an iscript cdna synthesis kit (biorad, # ), which was used as template for quantitative pcr. powertrack™ sybr green master mix (takara, #a ) was used for the reaction with the following primers: b-actin was used as housekeeping gene for normalization. each sirna knockdown experiment was performed in three replicates with each sample for qpcr being done in two technical replicates. mathematical models and bayesian inference: we developed two new mathematical models, making use of ordinary differential equations (odes), for the initial steps of cytokine-receptor binding, dimer formation and signal activation by hypil- and il- , respectively; namely, a set of odes for the hypil- system and a separate set of odes for the il- system (see end of this section for the set of odes included in each model). these odes describe the rate of change of the concentration for each molecular species considered in the receptor-ligand systems (hypil- and il- ) over time. by solving these odes, a time-course for the concentration of total (free and bound) phosphorylated stat and stat can be obtained and compared to the experimental data (supp. fig. b & c). the hypil- and il- mathematical models differ due to the reactions involved in the formation of the signaling dimer for each cytokine. under stimulation with hypil- , two hypil- bound gp monomers are required to form the homodimer (supp. fig. a), whereas under il- stimulation, we assume that il- binds to the il- ra chain and not to gp (supp. fig. b) and hence the heterodimer is comprised of an il- molecule bound to an il- ra monomer and one gp chain. in the mathematical models, we assume that upon formation of the dimers (homo- or heterodimer), these receptor chains become immediately phosphorylated. the models do not consider jak molecules explicitly. we are assuming that these molecules are constitutively bound to their corresponding receptor chains and that they phosphorylate immediately upon receptor phosphorylation (dimer formation). after the formation of the dimer, which we denote by 𝐷) or 𝐷"*, formed by hypil- or il- respectively, the biochemical reactions included in each mathematical model are similar, and are summarized as follows. table provides a description of the rates for each reaction considered in each (and both) mathematical model(s). in what follows we assume mass action kinetics for all the reactions. a free cytoplasmic unphosphorylated stat or stat molecule can bind to either receptor chain in the dimer, provided that the intracellular tyrosine residue of the receptor in the dimer is free (supp. fig. c & d). the stat or stat target for rev size b-actin catgtacgttgctatccaggc ctccttaatgtcacgcacgat bp stat ctagtggagtggaagcggag caccacaaacgagctctgaa bp gbp tcctcggattattgctcggc cctttgcgcttcagcctttt bp oas gaaggcagctcacgaaacc aggcctcagcctcttgtg bp socs gtccccccagaagagcctatta ttgacggtcttccgacagagat .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / molecule can subsequently dissociate from the receptor chain in the dimer or can become phosphorylated (with rate 𝑞) whilst bound to the dimer. we have assumed that the rate of stat or stat phosphorylation when bound does not depend on the stat type ( or ) or on the receptor chain (supp. fig. c & d). phosphorylated stat (pstat ) and stat (pstat ) molecules can dissociate from the dimer. once free in the cytoplasm, they can then dephosphorylate (supp. fig. g). we have assumed that this rate of stat dephosphorylation only depends on the concentration of the respective pstat type, free in the cytoplasm. we note that no allostery has been considered in the models and hence, phosphorylated and unphosphorylated stat molecules dissociate from the receptor with the same rate (supp. fig. c & d). finally, any molecular species containing receptor molecules can be removed from the system, due to internalisation or degradation, via one of two hypothesised mechanisms (supp. fig. e & f): • hypothesis (h ): receptors (free or bound, phosphorylated or unphosphorylated) are internalised/degraded with a rate proportional to the concentration of the species in which they are contained, or • hypothesis (h ): receptors (free or bound, phosphorylated or unphosphorylated) are internalised/degraded with a rate proportional to the product of the concentration of the species in which they are contained and the sum of the concentrations of free cytoplasmic phosphorylated stat and stat . we note that hypothesis assumes that receptor molecules (free or bound, phosphorylated or unphosphorylated) are being internalised/degraded as part of the natural cellular trafficking cycle. hypothesis is consistent with a potential feedback mechanism, whereby the free cytoplasmic pstat molecules would migrate to the nucleus and increase the production of negative feedback proteins, such as socs , which down-regulate cytokine signaling. thus, the internalisation/degradation rate of receptor molecules (free or bound, phosphorylated or unphosphorylated) under hypothesis increases with the total amount of free cytoplasmic phosphorylated stat and stat , to account for this surface receptor down-regulation. a depiction of the reactions in both the hypil- and il- mathematical models and under each hypothesis is given in supp. fig. where a), c), e) and g) describe the hypil- model and b), d), f) and g) describe the il- model. in this figure, 𝑖 ∈ { , } so that the reactions shown can either involve stat or stat . above or below the reaction arrows is a symbol which represents the rate at which the reaction occurs (under the assumption of mass action kinetics). the notation for the rate constants and initial concentrations in the models, along with their descriptions and units, are given in table . parameter description unit 𝑟#,) & ,𝑟#,"* & rate of receptor-ligand binding nm- s- 𝑟#,) , ,𝑟#,"* , rate of receptor-ligand dissociation s- 𝑟",) & ,𝑟","* & rate of monomers binding to form a dimer nm- s- 𝑟",) , ,𝑟","* , rate of dissociation of the dimer s- 𝑘$% & rate of stat𝑖 binding to gp nm- s- 𝑘$' & rate of stat𝑖 binding to il- ra nm- s- 𝑘$% , rate of stat𝑖 dissociating gp s- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑘$' , rate of stat𝑖 dissociating il- ra s- 𝑞 rate of stat phosphorylation on the dimer s- 𝑑$ rate of free pstat𝑖 dephosphorylation s - 𝛽),𝛽"* rate of receptor internalisation/degradation under hypothesis s- 𝛾),𝛾"* rate of receptor internalisation/degradation under hypothesis nm- s- [𝑅#( )] initial concentration of gp nm [𝑅"( )] initial concentration of il- rα nm [𝑆$( )] initial concentration of stat𝑖 nm table : notation, definitions and units for the parameter values used in the mathematical models, where 𝑖 ∈ { , } so that stat𝑖 corresponds to stat or stat . the hypil- mathematical model was formulated based on reactions involving the following species: • 𝐿) = hypil- , • 𝑅# = gp , • 𝐶# = gp - hypil- monomer, • 𝐷) = phosphorylated gp - hypil- - hypil- - gp homodimer, • 𝑆# = unbound cytoplasmic unphosphorylated stat , • 𝑆( = unbound cytoplasmic unphosphorylated stat , • 𝐷) ⋅ 𝑆# = dimer bound to stat , • 𝐷) ⋅ 𝑆( = dimer bound to stat , • 𝐷) ⋅ 𝑝𝑆# = dimer bound to pstat , • 𝐷) ⋅ 𝑝𝑆( = dimer bound to pstat , • 𝑆# ⋅ 𝐷) ⋅ 𝑆# = dimer bound to two molecules of stat , • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆# = dimer bound to two molecules of stat , one of which is phosphorylated, • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆# = dimer bound to two molecules of pstat , • 𝑆( ⋅ 𝐷) ⋅ 𝑆( = dimer bound to two molecules of stat , • 𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆( = dimer bound to two molecules of stat , one of which is phosphorylated, • 𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆( = dimer bound to two molecules of pstat , • 𝑆# ⋅ 𝐷) ⋅ 𝑆( = dimer bound to one molecule of stat and one of stat , • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆( = dimer bound to one molecule of pstat and one of stat , • 𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆( = dimer bound to one molecule of stat and one of pstat , • 𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆( = dimer bound to one molecule of pstat and one of pstat , • 𝑝𝑆# = unbound cytoplasmic phosphorylated stat , • 𝑝𝑆( = unbound cytoplasmic phosphorylated stat . the initial reactions in the hypil- signaling pathway can then be described by the odes ( ) – ( ), under the law of mass action, where the terms involving the parameter 𝛽) apply only to the model under hypothesis and the terms involving the parameter 𝛾) apply only to the model under hypothesis . square brackets around a species is a notation that denotes the concentration of this species with unit nm, and “⋅” implies a reaction bond between two molecules/species. the odes are valid for any time 𝑡, with 𝑡 ≥ , but time has been omitted in the species concentration for ease of notation. we note here that, for example [𝑅#] = [𝑅#](𝑡) for all 𝑡 ≥ . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑑[𝑅 ] 𝑑𝑡 = −𝑟 , + [𝑅 ][𝐿)] + 𝑟 , − [𝐶 ] − 𝛽 [𝑅 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑅 ] ( ) 𝑑[𝐿)] 𝑑𝑡 = −𝑟 , + [𝑅 ][𝐿)] + 𝑟 , − [𝐶 ] ( ) 𝑑[𝐶 ] 𝑑𝑡 = 𝑟 , + [𝑅 ][𝐿)] − 𝑟 , − [𝐶 ] − 𝑟 , + [𝐶 ] + 𝑟 , − [𝐷 ] − 𝛽 [𝐶 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐶 ] ( ) 𝑑[𝐷 ] 𝑑𝑡 = 𝑟 , + [𝐶 ] − 𝑟 , − [𝐷 ] − 𝑘 𝑎 + [𝐷 ][𝑆 ] + 𝑘 𝑎 − ([𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) − 𝑘 𝑎 + [𝐷 ][𝑆 ] + 𝑘 𝑎 − ([𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) − 𝛽 [𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ] ( ) 𝑑[𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ]( [𝐷 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑎 − ([𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) + 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ]( [𝐷 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑎 − ([𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(]) + 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ] − 𝑘 𝑎 − [𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆# ⋅ 𝐷 ⋅ 𝑆(] − 𝑞[𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ] − 𝑘 𝑎 − [𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑞[𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(] + 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑝𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑝𝑆 ] ( ) 𝑑[𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑝𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑝𝑆 ] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑞[𝑆) ⋅ 𝐷* ⋅ 𝑆)] − 𝑞[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] − 𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / −𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆)] 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑞[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] −𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆)] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆)] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] −𝛽*[𝑝𝑆+ ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷* ⋅ 𝑝𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 + [𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] −𝑘+,- [𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝑞[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝑘),- [𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] −𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] −𝑘),- [𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝑞[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝑘+,- [𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] −𝛽*[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] − 𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞([𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) −[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+](𝑘),- + 𝑘+,- ) − 𝛽*[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] −𝛾*([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷* ⋅ 𝑝𝑆+] ( ) 𝑑[𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 − ([𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 − ([𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑑 [𝑝𝑆 ] ( ) similarly, and with some species in common with the hypil- model, the il- model has been formulated based on reactions involving the following species: • 𝐿"* = il- , • 𝑅# = gp , • 𝑅" = il- ra, • 𝐶" = il- ra - il- monomer, • 𝐷"* = phosphorylated il- ra - il- - gp heterodimer, • 𝑆# = unbound cytoplasmic unphosphorylated stat , • 𝑆( = unbound cytoplasmic unphosphorylated stat , • 𝑆# ⋅ 𝐷"* = dimer bound to stat via 𝑅#, .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / • 𝑆( ⋅ 𝐷"* = dimer bound to stat via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* = dimer bound to pstat via 𝑅#, • 𝑝𝑆( ⋅ 𝐷"* = dimer bound to pstat via 𝑅#, • 𝐷"* ⋅ 𝑆# = dimer bound to stat via 𝑅", • 𝐷"* ⋅ 𝑆( = dimer bound to stat via 𝑅", • 𝐷"* ⋅ 𝑝𝑆# = dimer bound to pstat via 𝑅", • 𝐷"* ⋅ 𝑝𝑆( = dimer bound to pstat via 𝑅", • 𝑆# ⋅ 𝐷"* ⋅ 𝑆# = dimer bound to two molecules of stat , • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆# = dimer bound to two molecules of stat , one of them phosphorylated on 𝑅#, • 𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆# = dimer bound to two molecules of stat , one of them phosphorylated on 𝑅", • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆# = dimer bound to two molecules of pstat , • 𝑆( ⋅ 𝐷"* ⋅ 𝑆( = dimer bound to two molecules of stat , • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆( = dimer bound to two molecules of stat , one of them phosphorylated on 𝑅#, • 𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆( = dimer bound to two molecules of stat , one of them phosphorylated on 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆( = dimer bound to two molecules of pstat , • 𝑆# ⋅ 𝐷"* ⋅ 𝑆( = dimer bound to stat via 𝑅# and stat via 𝑅", • 𝑆( ⋅ 𝐷"* ⋅ 𝑆# = dimer bound to stat via 𝑅" and stat via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆( = dimer bound to pstat via 𝑅# and stat via 𝑅", • 𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆# = dimer bound to pstat via 𝑅" and stat via 𝑅#, • 𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆( = dimer bound to stat via 𝑅# and pstat via 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆# = dimer bound to stat via 𝑅" and pstat via 𝑅#, • 𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆( = dimer bound pstat via 𝑅# and pstat via 𝑅", • 𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆# = dimer bound pstat via 𝑅# and pstat via 𝑅#, • 𝑝𝑆# = unbound cytoplasmic phosphorylated stat , • 𝑝𝑆( = unbound cytoplasmic phosphorylated stat . again, under the law of mass action, the initial reactions in the il- signaling pathway can be described by the odes ( ) – ( ). 𝑑[𝑅 ] 𝑑𝑡 = −𝑟 , + [𝐶 ][𝑅 ] + 𝑟 , − [𝐷 ] − 𝛽 [𝑅 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑅 ] ( ) 𝑑[𝑅 ] 𝑑𝑡 = −𝑟 , + [𝑅 ][𝐿 ] + 𝑟 , − [𝐶 ] − 𝛽 [𝑅 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑅 ] ( ) 𝑑[𝐿 ] 𝑑𝑡 = −𝑟 , + [𝑅 ][𝐿 ] + 𝑟 , − [𝐶 ] ( ) 𝑑[𝐶 ] 𝑑𝑡 = 𝑟 , + [𝑅 ][𝐿 ] − 𝑟 , − [𝐶 ] − 𝑟 , + [𝐶 ][𝑅 ] + 𝑟 , − [𝐷 ] − 𝛽 [𝐶 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐶 ] ( ) 𝑑[𝐷 ] 𝑑𝑡 = 𝑟 , + [𝐶 ][𝑅 ] − 𝑟 , − [𝐷 ] − m𝑘 𝑎 + + 𝑘 𝑏 + n[𝐷 ][𝑆 ] + 𝑘 𝑎 − ([𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) − m𝑘 𝑎 + + 𝑘 𝑏 + n[𝐷 ][𝑆 ] + 𝑘 𝑎 − ([𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) − 𝛽 [𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ] ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑑[𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ]([𝐷 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑎 − ([𝑆 ⋅ 𝐷 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑘 𝑏 + [𝑆 ]([𝐷 ] + [𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ] + [𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) + 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝑆 ]([𝐷 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ] + [𝐷 ⋅ 𝑆 ] + [𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑎 − ([𝑆 ⋅ 𝐷 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑘 𝑏 + [𝑆 ]([𝐷 ] + [𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ] + [𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) + 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝑆 ⋅ 𝐷 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ] − 𝑞[𝑆 ⋅ 𝐷 ] − 𝑘 𝑏 + [𝑆 ][𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 + [𝑆 ][𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝑆 ⋅ 𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑆 ⋅ 𝐷 ] ( ) 𝑑[𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑆 ][𝐷 ] − 𝑘 𝑏 − [𝐷 ⋅ 𝑆 ] − 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑆 ⋅ 𝐷 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ] − 𝑞[𝑆 ⋅ 𝐷 ] − 𝑘 𝑏 + [𝑆 ][𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 + [𝑆 ][𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑏 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝑆 ⋅ 𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑆 ⋅ 𝐷 ] ( ) 𝑑[𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑆 ][𝐷 ] − 𝑘 𝑏 − [𝐷 ⋅ 𝑆 ] − 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝛽 [𝐷 ⋅ 𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑆 ] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ] 𝑑𝑡 = −𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑞[𝑆 ⋅ 𝐷 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝑝𝑆 ⋅ 𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑝𝑆 ⋅ 𝐷 ] ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑑[𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝐷 ⋅ 𝑝𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑝𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 − [𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑝𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑝𝑆 ] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ] 𝑑𝑡 = −𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + 𝑞[𝑆 ⋅ 𝐷 ] − 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝑝𝑆 ⋅ 𝐷 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝑝𝑆 ⋅ 𝐷 ] ( ) 𝑑[𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = −𝑘 𝑎 + [𝐷 ⋅ 𝑝𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 + [𝐷 ⋅ 𝑝𝑆 ][𝑆 ] + 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑞[𝐷 ⋅ 𝑆 ] − 𝑘 𝑏 − [𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + 𝑘 𝑎 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] − 𝛽 [𝐷 ⋅ 𝑝𝑆 ] − 𝛾 ([𝑝𝑆 ] + [𝑝𝑆 ])[𝐷 ⋅ 𝑝𝑆 ] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑘) [𝑆) ⋅ 𝐷 ][𝑆)] − 𝑘) - [𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆)] −𝛽 [𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷 ⋅ 𝑆)] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝑘),- [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆)] −𝛽 [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆)] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] +𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] − 𝑘) - [𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] −𝛽 [𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞([𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) −[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)](𝑘),- + 𝑘) - ) − 𝛽 [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] −𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆)] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑘+ [𝑆+ ⋅ 𝐷 ][𝑆+] − 𝑘+ - [𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆+] −𝛽 [𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷 ⋅ 𝑆+] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝑘+,- [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆+] −𝛽 [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / +𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] − 𝑘+ - [𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] −𝛽 [𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞([𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) −[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+](𝑘+,- + 𝑘+ - ) − 𝛽 [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] −𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑘+ [𝑆) ⋅ 𝐷 ][𝑆+] − 𝑘+ - [𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆+] −𝛽 [𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷 ⋅ 𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑘) [𝑆+ ⋅ 𝐷 ][𝑆)] − 𝑘) - [𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆)] −𝛽 [𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷 ⋅ 𝑆)] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝑘),- [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆+] −𝛽 [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑆+] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] 𝑑𝑡 = 𝑘 𝑏 + [𝑝𝑆 ⋅ 𝐷 ][𝑆 ] − 𝑘 𝑏 − [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] +𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝑘+,- [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆)] −𝛽 [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑆)] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] +𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑆+] − 𝑞[𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] − 𝑘+ - [𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] −𝛽 [𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] ( ) 𝑑[𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 + [𝑆 ][𝐷 ⋅ 𝑝𝑆 ] − 𝑘 𝑎 − [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] +𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑆)] − 𝑞[𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] − 𝑘) - [𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] −𝛽 [𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] − 𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞([𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) −[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+](𝑘),- + 𝑘+ - ) − 𝛽 [𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] −𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆) ⋅ 𝐷 ⋅ 𝑝𝑆+] ( ) 𝑑[𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] 𝑑𝑡 = 𝑞([𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ]) −[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)](𝑘+,- + 𝑘) - ) − 𝛽 [𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] −𝛾 ([𝑝𝑆)] + [𝑝𝑆+])[𝑝𝑆+ ⋅ 𝐷 ⋅ 𝑝𝑆)] ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑑[𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 − ([𝑝𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑑 [𝑝𝑆 ] ( ) 𝑑[𝑝𝑆 ] 𝑑𝑡 = 𝑘 𝑎 − ([𝑝𝑆 ⋅ 𝐷 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) + 𝑘 𝑏 − ([𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ] + [𝑝𝑆 ⋅ 𝐷 ⋅ 𝑝𝑆 ]) − 𝑑 [𝑝𝑆 ] ( ) similarly to the hypil- model, the terms in equations ( ) - ( ) involving the parameter 𝛽"* apply only to the model under hypothesis and the terms involving the parameter 𝛾"* apply only to the model under hypothesis . we now describe how we have made use of the experimental data (fig. b and c supp.) to parameterise the mathematical models described above. since the experimental outputs are levels of pstat and pstat as a function of time under hypil- and il- stimulation (fig. b and c supp.), we consider two model outputs of interest for the hypil- and il- mathematical models, which are proportional to the experimental data in supp. figure b and c; namely, the sum of all molecular species (variables) containing phosphorylated stat (free or bound) ([𝑝𝑆#]-,., for 𝑗 ∈ { , }) and the sum of all species (variables) containing phosphorylated stat (free or bound) ([𝑝𝑆(]-,., for 𝑗 ∈ { , }). the concentrations of the two model outputs of interest at any time 𝑡 are given by [𝑝𝑆#]-,)(𝑡) = [𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑆(](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆#](𝑡), ( ) [𝑝𝑆(]-,)(𝑡) = [𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑆#](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆(](𝑡), ( ) for the hypil- model, and by [𝑝𝑆#]-,"*(𝑡) = [𝑝𝑆# ⋅ 𝐷"*](𝑡) + [𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆#](𝑡) + [𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷"* ⋅ 𝑆(](𝑡) + [𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆#](𝑡), ( ) [𝑝𝑆(]-,"*(𝑡) = [𝑝𝑆( ⋅ 𝐷"*](𝑡) + [𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆(](𝑡) + [𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷"* ⋅ 𝑆#](𝑡) + [𝑆# ⋅ 𝐷"* ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆# ⋅ 𝐷) ⋅ 𝑝𝑆(](𝑡) + [𝑝𝑆( ⋅ 𝐷) ⋅ 𝑝𝑆#](𝑡) + [𝑝𝑆(](𝑡), ( ) for the il- model. having developed two mathematical models for the stimulation of the experimental system with hypil- and il- , it was then our objective to parameterise these models making use of approximate bayesian computation sequential monte carlo (abc-smc). firstly, a bayesian model selection was carried out to determine which hypothesis (mechanism) of internalisation/degradation of receptor molecules is most likely given the data. once a hypothesis was selected, together with the experimental data, the abc-smc method allows one to obtain posterior distributions for each of the parameter values and initial concentrations in the mathematical models. in this way, we can learn about which reactions and parameters in the models are causing the differential signaling by pstat observed when stimulating with hypil- and il- . the experimental data we used to compare with the mathematical model .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / outputs, was the mean relative fluorescence intensity of total phosphorylated stat and total phosphorylated stat in both rpe and th- cells (supp. figure b and c). we normalised the data to obtain dimensionless values, which can be compared with the mathematical model outputs. firstly, we constructed a linear model for the fluorescence intensity (background fluorescence) of antibodies for phosphorylated stat and stat in unstimulated cells. we subtracted the value of this linear model at each time point from the corresponding fluorescence intensity in hypil- and il- stimulated cells, for each repeat of the experiment and each cell type. denoting by 𝑓 the experimental fluorescence intensity, 𝑓(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) corresponds to the fluorescence intensity for the 𝑟th repeat, 𝑟 ∈ 𝑅 = { , , , } with antibody for stat𝑖, 𝑖 ∈ 𝐼 = { , } at time point 𝑡𝑝 ∈ 𝑇𝑃 = { 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛, 𝑚𝑖𝑛} under stimulation by cytokine il-𝑗 (hypil-𝑗 when 𝑗 = ), with 𝑗 ∈ 𝐽 = { , } and in cell type 𝑑 ∈ 𝐷 = {rpe ,th- }. each data point 𝑑𝑎𝑡𝑎(𝑟, 𝑖, 𝑡𝑝,𝑗,𝑑), to be used in the bayesian inference and bayesian model selection was then computed as 𝑑𝑎𝑡𝑎(𝑟, 𝑖, 𝑡𝑝,𝑗,𝑑) = 𝑓(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) 𝑓(𝑟, 𝑖, 𝑡𝑝 = 𝑚𝑖𝑛,𝑗 = ,𝑑) . to compare the model output, 𝑠𝑖𝑚, with the data, the output was normalised in the same way as the data, i.e., 𝑠𝑖𝑚(𝑖,𝑡𝑝,𝑗,𝑑) = [𝑝𝑆$]-,.(𝑡𝑝,𝑑) [𝑝𝑆$]-,"*( 𝑚𝑖𝑛,𝑑) , where [𝑝𝑆$]-,.(𝑡𝑝,𝑑) denotes the total concentration of phosphorylated stat𝑖 at time 𝑡𝑝 (see equations - ) when considering cell type 𝑑. in this way, experimental data and the mathematical model outputs are comparable. the similarity between the model output and the data points is then computed by the introduction of a distance measure 𝛿(𝑠𝑖𝑚,𝑑𝑎𝑡𝑎). here, this distance measure was chosen as a generalisation of the euclidean distance, where 𝛿/(𝑠𝑖𝑚,𝑑𝑎𝑡𝑎)" = z z zm𝑠𝑖𝑚(𝑖,𝑡𝑝,𝑗,𝑑) − 𝜇/% %(𝑖,𝑡𝑝,𝑗,𝑑)n " .∈ ∈- $∈ , for 𝑑 ∈ 𝐷 = {rpe ,th- }, where 𝜇/% %(𝑖,𝑡𝑝,𝑗,𝑑) is the mean of the four repeats of the data and is given by 𝜇/% %(𝑖,𝑡𝑝,𝑗,𝑑) = z𝑑𝑎𝑡𝑎(𝑟, 𝑖,𝑡𝑝,𝑗,𝑑) # . to carry out the bayesian model selection and bayesian parameter inference, prior beliefs about the parameters were firstly defined. each of the parameters (reaction rates) and initial concentrations in the model were sampled from a prior distribution, where the distribution was informed by experimental data or values from the literature, when possible. the choice of prior distributions is given in table . parameter prior distribution reference 𝑟#,) & for 𝑟 ∼ 𝑁(− , . ) * 𝑟#,) , for 𝑟 ∼ 𝑁(− . , . ) * 𝑟#,"* & for 𝑟 ∼ 𝑁(− . , . ) * .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝑟#,"* , for 𝑟 ∼ 𝑁(− . , . ) * 𝑟",$ & for 𝑗 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− , ) ( ) 𝑟",$ , for 𝑗 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− , ) ( ) 𝑘$% & ,𝑘$' & for 𝑖 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− , ) ** 𝑘$% , ,𝑘$' , for 𝑖 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− , ) ** 𝑞 for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− , ) assumed 𝑑$ for 𝑖 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− ,− ) *** β. for 𝑗 ∈ { , } for 𝑟 ∼ 𝑈𝑛𝑖𝑓(− ,− ) † [𝑅#( )] 𝑁( . , . ) ‡ [𝑅"( )] 𝑁( . , . ) ‡ [𝑆#( )] 𝑁( , ) ( ) [𝑆(( )] 𝑁( , ) ( ) table : prior distributions assigned to each parameter and initial concentration in the model. * these distributions are centred around measurements obtained from cell surface receptor quantification experiments. ** these distributions were derived based on 𝐾/ values obtained from the literature ( ). *** these distributions are based on values derived from experimental data in which the cells were treated with tofacitinib. † these distributions were based on values derived from experimental data in which the cells were treated with cycloheximide. ‡ these distributions were based on computations involving approximate cell sizes and average numbers of molecules per cell. we made use of the prior distributions from table to then carry out a bayesian model selection to determine which hypothesis is most likely given the rpe data for both hypil- and il- signaling. we ran ) simulations for each mathematical model (hypil- and il- ) and for each hypothesis, sampling model parameters from their prior distributions. we then computed a summary statistic for varying values of 𝛿 :#,∗, the distance threshold between the mathematical model and data at which parameters are accepted (or rejected) in the abc. finally, we computed 𝑓(𝐻<), the number of accepted parameter sets for hypothesis 𝑘, where the parameter sets are accepted if they result in a distance value less than or equal to 𝛿 :#,∗, the distance threshold. this allowed us to compute the relative probability, 𝑝(𝐻=), for each hypothesis, as defined by the following equation 𝑝(𝐻=|δ :#,∗) = 𝑓(𝐻=|δ :#,∗) 𝑓(𝐻#|δ :#,∗) + 𝑓(𝐻"|δ :#,∗) , for 𝑘 ∈ { , }. the results of the model selection analysis for rpe are shown in figure d, where the relative probability of hypothesis increases as 𝛿 :#,∗ tends to , whilst the relative probability of hypothesis decreases as a function of 𝛿 :#,∗. we hence concluded that the experimental data together with the mathematical models for hypil- and il- signaling provide greater support to hypothesis (around %) when compared to hypothesis (around %). we note that as the distance threshold, 𝛿 :#,∗, is increased, both hypotheses become equally likely, as is to be expected. given the results of the model selection, the bayesian parameter inference for the mathematical models of hypil- and il- signaling was only carried out for hypothesis . we used the abc, sequential monte carlo (abc-smc), approach ( ), to obtain posterior distributions for the parameters in table , making use of the prior distributions in table . all .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / model parameters in table were estimated for the rpe data set. a subset of the parameters, which we would expect may vary with cell type, were then estimated for the th- data set. in particular, the parameters not being estimated for th- were sampled from the posterior distributions obtained via the abc-smc for rpe , and those parameters estimated separately for th- were: 𝑞, 𝑑#, 𝑑(, 𝛽), 𝛽"*, [𝑅#( )], [𝑅"( )], [𝑆#( )] and [𝑆(( )]. to further validate the two mathematical models of cytokine signaling, we aimed to reproduce additional experimental results making use of the posterior parameter predictions from the rpe data abc-smc. firstly, and in order to replicate the experimental dose response curve seen in supp. fig. a, we run both models using the accepted parameters sets from the abc-smc for different values of cytokine concentration, within the range [ , – "] log nm. the results of this analysis are seen in supp. fig. b. we also modified the mathematical models to allow them to describe the il- rα-gp chimera experiments (fig. c). in particular, a new mathematical model for the chimera experiments was developed as follows: it consisted of the odes from the il- model which are involved in the formation of the dimer, (equations ( ) – ( )) and the odes from the hypil- model post-dimer formation (equations ( ) – ( )), in which 𝐷) was replaced by 𝐷"*. the ode for the il- induced dimer in the chimera model was as follows 𝑑[𝐷"*] 𝑑𝑡 = 𝑟","* & [𝐶"][𝑅#] − 𝑟","* , [𝐷"*] − 𝑘#% & [𝐷"*][𝑆#] + 𝑘#% , ([𝑆# ⋅ 𝐷"*] + [𝑝𝑆# ⋅ 𝐷"*]) − 𝑘(% & [𝐷"*][𝑆(] + 𝑘(% , ([𝑆( ⋅ 𝐷"*] + [𝑝𝑆( ⋅ 𝐷"*]) − β"*[𝐷"*]. we simulated both the original mathematical model of il- and the chimera model using the accepted parameter sets from the abc-smc. the results can be seen in supp. fig. a. finally, we focussed on one of the mutant varieties of il- rα, y f and sought to reproduce the results of fig. b making use of the mathematical model of il- signaling. since the mutation decreases the affinity of stat to il- rα, we fixed the association and dissociation rates of stat to the il- rα chain,𝑘#' & and 𝑘#' , , at values which resulted in a high µm affinity. the specific values chosen were 𝑘#' & = ,> nm- s- and 𝑘#' , = # s- which yields an affinity of " µm. the rate 𝑘#' , was chosen as approximately the median of the posterior distribution for this parameter from the abc-smc, and the rate 𝑘#' & was then significantly decreased in order to increase the affinity value. we simulated the mathematical model of il- signaling using the accepted parameter sets from the abc-smc, but where the rates 𝑘#' & and 𝑘#' , were fixed as described above. the pointwise medians and % credible intervals of these simulations are plotted in supp. fig. c, as well as the simulations for the wt, without altering any of the parameter values from the posterior distributions. altering the binding affinity of stat to il- rα in this way in the mathematical model allows us to generate results which replicate reasonably well, the experimental observations for the y f mutant in figure b. live-cell dual-color single-molecule imaging studies: single molecule imaging experiments were carried out by total internal reflection fluorescence (tirf) microscopy with an inverted microscope (olympus ix ) equipped with a triple-line total internal reflection (tir) illumination condenser (olympus) and a back-illuminated electron multiplied (em) ccd camera (ixon du d, x pixel, andor technology) as recently described ( - ). a x magnification objective with a numerical aperture of . (uapo / . tirfm, olympus) was used for tir illumination. all experiments were carried out at room temperature in medium without phenol red supplemented with an oxygen scavenger and a redox-active photoprotectant to minimize photobleaching ( ). for heterodimerization experiments of il- ra and gp cell surface labeling of rpe gp ko, co-transfected with mxfpe-il- ra and mxfpm-gp , was achieved by adding agfp-ennbrho and .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / agfp-minbdy to the medium at equal concentrations ( nm) and incubated for at least min prior to stimulation with il- ( nm) or hypil- ( nm). for homodimerization experiments with mxfpm-gp , agfp-minbdy and agfp-minbrho ( ) were used for cell surface receptor labelling as described above. the nanobodies were kept in the bulk solution during the whole experiment in order to ensure high equilibrium binding to mxfp- gp . for simultaneous dual color acquisition, agfp-nbrho was excited by a nm diode-pumped solid-state laser at . mw (~ w/cm ) and agfp-nbdy by a nm laser diode at . mw (~ w/cm ). fluorescence was detected using a spectral image splitter (dualview, optical insight) with a dcxr dichroic beam splitter (chroma) in combination with the bandpass filter / (semrock) for detection of rho and / (chroma) for detection of dy dividing each emission channel into x pixel. image stacks of frames were recorded at ms/frame. single molecule localization and single molecule tracking were carried out using the multiple- target tracing (mtt) algorithm ( ) as described previously ( ). step-length histograms were obtained from single molecule trajectories and fitted by two fraction mixture model of brownian diffusion. average diffusion constants were determined from the slope ( - steps) of the mean square displacement versus time lapse diagrams. immobile molecules were identified by the density-based spatial clustering of applications with noise (dbscan) algorithm as described recently ( ). for comparing diffusion properties and for co-tracking analysis, immobile particles were excluded from the data set. prior to co-localization analysis, imaging channels were aligned with sub-pixel precision by using a spatial transformation. to this end, a transformation matrix was calculated based on a calibration measurement with multicolour fluorescent beads (tetraspeck microspheres . mm, invitrogen) visible in both spectral channels (cp tform of type ‘affine’, the mathworks matlab a). individual molecules detected in the both spectral channels were regarded as co-localized, if a particle was detected in both channels of a single frame within a distance threshold of nm radius. for single molecule co-tracking analysis, the mtt algorithm was applied to this dataset of co-localized molecules to reconstruct co-locomotion trajectories (co- trajectories) from the identified population of co-localizations. for the co-tracking analysis, only trajectories with a minimum of steps (~ ms) were considered in order to robustly remove random receptor co-localizations ( ). for heterodimerization experiments of mxfpe-il- ra and mxfpm-gp , the relative fraction of dimerized receptors was calculated from the number of co-trajectories relative to the number of il- ra trajectories. gp was expressed in moderate excess (~ . - fold), so that maximal receptor assembly was not limited by abundance of the low-affinity subunit gp . for homodimerization experiments with gp , the relative fraction of co-tracked molecules was determined with respect to the absolute number of trajectories and corrected for gp stochastically double-labelled with the same fluorophore species as follows: 𝐴𝐵∗ = ?@ "×bc ! !"# d×c # !"# de , 𝑟𝑒𝑙.𝑐𝑜 − 𝑙𝑜𝑐𝑜𝑚𝑜𝑡𝑖𝑜𝑛 = "×?@ ∗ (?&@) where a, b, ab and ab* are the numbers of trajectories observed for rho , dy , co- trajectories and corrected co-trajectories, respectively. the two-dimensional equilibrium dissociation constants (𝐾!"!) were calculated according to the law of mass action for a monomer-dimer equilibrium: heterodimerization (il- ra+gp ): .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / 𝐾! "! = m[𝐺𝑃 ] − (𝛼 × [𝐼𝐿 𝑅𝑎])n × m[𝐼𝐿 𝑅𝑎] − (𝛼 × [𝐼𝐿 𝑅𝑎])n (𝛼 × [𝐼𝐿 𝑅𝑎]) or 𝐾! "! = [𝐺𝑃 ] × j 𝛼 − k + [𝐼𝐿 𝑅𝑎] × (𝛼 − ) with: 𝛼 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐼𝐿 𝑏𝑜𝑢𝑛𝑑 𝐼𝐿 𝑅𝑎 𝑖𝑛 𝑐𝑜𝑚𝑝𝑙𝑒𝑥 𝑤𝑖𝑡ℎ 𝐺𝑃 homodimerization (gp +gp ): 𝐾! "! = [i]% [!] = ([i]&,"[!])% [!] 𝐾! "! = k[l #(m],"×(n×[l #(m])o % "×(n×[l #(m]) with: 𝛼 = 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑜𝑓 𝐺𝑃 ℎ𝑜𝑚𝑜𝑑𝑖𝑚𝑒𝑟𝑠 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑡𝑜 [𝐺𝑃 ]/ where [m] and [d] are the concentrations of the monomer and the dimer, respectively, and [m] is the total receptor concentration. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / references: . j. j. o'shea, r. plenge, jak and stat signaling molecules in immunoregulation and immune-mediated disease. immunity , - ( ). . s. pflanz et al., il- , a heterodimeric cytokine composed of ebi and p protein, induces proliferation of naive cd + t cells. immunity , - ( ). . h. yoshida, c. a. hunter, the immunobiology of interleukin- . annu rev immunol , - ( ). . j. s. stumhofer et al., interleukin negatively regulates the development of interleukin -producing t helper cells during chronic inflammation of the central nervous system. nat immunol , - ( ). . c. diveu et al., il- blocks rorc expression to inhibit lineage commitment of th cells. j immunol , - ( ). . d. c. fitzgerald et al., suppression of autoimmune inflammation of the central nervous system by interleukin secreted by interleukin -stimulated t cells. nat immunol , - ( ). . j. s. stumhofer et al., interleukins and induce stat -mediated t cell production of interleukin . nat immunol , - ( ). . c. pot, l. apetoh, a. awasthi, v. k. kuchroo, induction of regulatory tr cells and inhibition of t(h) cells by il- . semin immunol , - ( ). . m. j. boulanger, d. c. chow, e. e. brevnova, k. c. garcia, hexameric structure and assembly of the interleukin- /il- alpha-receptor/gp complex. science , - ( ). . s. rose-john, interleukin- family cytokines. cold spring harb perspect biol , ( ). . c. a. hunter, s. a. jones, il- as a keystone cytokine in health and disease. nature immunology , - ( ). . t. korn et al., il- controls th immunity in vivo by inhibiting the conversion of conventional t cells into foxp + regulatory t cells. proc natl acad sci u s a , - ( ). . a. kimura, t. kishimoto, il- : regulator of treg/th balance. eur j immunol , - ( ). . g. w. jones et al., loss of cd + t cell il- r expression during inflammation underlines a role for il- trans signaling in the local maintenance of th cells. j immunol , - ( ). . c. rolvering et al., crosstalk between different family members: il recapitulates ifn gamma responses in hcc cells, but is inhibited by il -type cytokines. bba-mol cell res , - ( ). . a. p. costa-pereira et al., mutational switch of an il- response to an interferon- gamma-like response. p natl acad sci usa , - ( ). . j. schmitz, m. weissenbach, s. haan, p. c. heinrich, f. schaper, socs exerts its inhibitory function on interleukin- signal transduction through the shp recruitment site of gp . journal of biological chemistry , - ( ). . h. yasukawa et al., il- induces an anti-inflammatory response in the absence of socs in macrophages. nat immunol , - ( ). . b. a. croker et al., socs negatively regulates il- signaling in vivo. nat immunol , - ( ). . c. brender et al., suppressor of cytokine signaling regulates cd t-cell proliferation by inhibition of interleukins and . blood , - ( ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . a. camporeale, v. poli, il- , il- and stat : a holy trinity in auto-immunity? front biosci (landmark ed) , - ( ). . g. regis, s. pensa, d. boselli, f. novelli, v. poli, ups and downs: the stat :stat seesaw of interferon and gp receptor signalling. semin cell dev biol , - ( ). . s. lucas, n. ghilardi, j. li, f. j. de sauvage, il- regulates il- responsiveness of naive cd (+) t cells through stat -dependent and -independent mechanisms. p natl acad sci usa , - ( ). . s. kamiya et al., an indispensable role for stat in il- -induced t-bet expression but not proliferation of naive cd (+) t cells. journal of immunology , - ( ). . a. takeda et al., cutting edge: role of il- /wsx- signaling for induction of t-bet through activation of stat during initial th commitment. journal of immunology , - ( ). . c. neufert et al., il- controls the development of inducible regulatory t cells and th cells via differential effects on stat . eur j immunol , - ( ). . t. owaki et al., stat is indispensable to il- -mediated cell proliferation but not to il- -induced th differentiation and suppression of proinflammatory cytokine production. journal of immunology , - ( ). . k. hirahara et al., asymmetric action of stat transcription factors drives transcriptional outputs and cytokine specificity. immunity , - ( ). . s. oniki et al., interleukin- and interleukin- exert quite different antitumor and vaccine effects on poorly immunogenic melanoma. cancer res , - ( ). . m. fischer et al., i. a bioactive designer cytokine for human hematopoietic progenitor cell expansion. nat biotechnol , - ( ). . h. h. oberg, d. wesch, s. grussel, s. rose-john, d. kabelitz, differential expression of cd and cd mediates different stat- phosphorylation in cd +cd - and cd high regulatory t cells. int immunol , - ( ). . p. o. krutzik, m. r. clutter, a. trejo, g. p. nolan, fluorescent cell barcoding for multiplex flow cytometry. curr protoc cytom chapter , unit ( ). . u. a. betz, w. muller, regulated expression of gp and il- receptor alpha chain in t cell maturation and activation. int immunol , - ( ). . j. martinez-fabregas et al., kinetics of cytokine receptor trafficking determine signaling and functional selectivity. elife , ( ). . c. gorby et al., engineered il- variants elicit potent immunomodulatory effects at low ligand doses. sci signal , ( ). . v. ruprecht, weghuber, j., wieser, s., schütz, g. j, in advances in planar lipid bilayers and liposomes. ( ), vol. ,, pp. - . . i. moraga et al., instructive roles for cytokine-receptor binding parameters in determining signaling and functional potency. science signaling , ( ). . s. wilmes et al., receptor dimerization dynamics as a regulatory valve for plasticity of type i interferon signaling. j cell biol , - ( ). . s. wilmes et al., mechanism of homodimeric cytokine receptor activation and dysregulation by oncogenic mutations. science , - ( ). . i. moraga et al., tuning cytokine receptor signaling by re-orienting dimer geometry with surrogate ligands. cell , - ( ). . s. pflanz et al., wsx- and glycoprotein constitute a signal-transducing receptor for il- . j immunol , - ( ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . m. wiederkehr-adam et al., characterization of phosphopeptide motifs specific for the src homology domains of signal transducer and activator of transcription (stat ) and stat . j biol chem , - ( ). . a. pradhan, q. t. lambert, l. n. griner, g. w. reuther, activation of jak -v f by components of heterodimeric cytokine receptors. j biol chem , - ( ). . h. kim, t. s. hawley, r. g. hawley, h. baumann, protein tyrosine phosphatase (shp- ) moderates signaling by gp but is not required for the induction of acute- phase plasma protein genes in hepatic cells. mol cell biol , - ( ). . d. w. huang, b. t. sherman, r. a. lempicki, systematic and integrative analysis of large gene lists using david bioinformatics resources. nat protoc , - ( ). . j. bancerek et al., cdk kinase phosphorylates transcription factor stat to selectively regulate the interferon response. immunity , - ( ). . s. rutz et al., deubiquitinase duba is a post-translational brake on interleukin- production in t cells. nature , - ( ). . k. l. o'hagan, s. d. miller, h. phee, pak is essential for the function of foxp +regulatory t cells through maintaining a suppressive treg phenotype. sci rep- uk , ( ). . d. z. ye, j. field, pak signaling in cancer. cell logist , - ( ). . y. liao, j. wang, e. j. jaehnig, z. shi, b. zhang, webgestalt : gene set analysis toolkit with revamped uis and apis. nucleic acids res , w -w ( ). . j. satoh, h. tabunoki, a comprehensive profile of chip-seq-based stat target genes suggests the complexity of stat -mediated gene regulatory mechanisms. gene regul syst bio , - ( ). . i. rusinova et al., interferome v . : an updated database of annotated interferon- regulated genes. nucleic acids res , d - ( ). . h. n. suh et al., role of interleukin- in the control of dna synthesis of hepatocytes: involvement of pkc, p / mapks, and ppardelta. cell physiol biochem , - ( ). . a. v. villarino et al., il- limits il- production during th differentiation. j immunol , - ( ). . k. hirahara et al., interleukin- priming of t cells controls il- production in trans via induction of the ligand pd-l . immunity , - ( ). . x. hu et al., sensitization of ifn-gamma jak-stat signaling during macrophage activation. nat immunol , - ( ). . v. francois-newton, m. livingstone, b. payelle-brogard, g. uze, s. pellegrini, usp establishes the transcriptional and anti-proliferative interferon alpha/beta differential. biochem j , - ( ). . k. zenke, m. muroi, k. i. tanamoto, irf supports dna binding of stat by promoting its phosphorylation. immunol cell biol , - ( ). . k. karwacz et al., critical role of irf and batf in forming chromatin landscape during type regulatory cell differentiation. nat immunol , - ( ). . a. yoshimura, y. wakabayashi, t. mori, cellular and molecular basis for the regulation of inflammation by tgf-beta. j biochem , - ( ). . a. awasthi et al., a dominant function for interleukin in generating interleukin - producing anti-inflammatory t cells. nat immunol , - ( ). . j. b. brown et al., p-selectin glycoprotein ligand- is needed for sequential recruitment of t-helper (th ) and local generation of th t cells in dextran sodium sulfate (dss) colitis. inflamm bowel dis , - ( ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . m. matsumoto et al., cd collaborates with p-selectin glycoprotein ligand- to mediate e-selectin-dependent t cell migration into inflamed skin. j immunol , - ( ). . d. n. slenter et al., wikipathways: a multifaceted pathway database bridging metabolomics to other omics research. nucleic acids res , d -d ( ). . a. petretto et al., proteomic analysis uncovers common effects of ifn-gamma and il- on the hla class i antigen presentation machinery in human cancer cells. oncotarget , - ( ). . l. h. wong, i. hatzinisiriou, r. j. devenish, s. j. ralph, ifn-gamma priming up- regulates ifn-stimulated gene factor (isgf ) components, augmenting responsiveness of ifn-resistant melanoma cells to type i ifns. j immunol , - ( ). . m. tokuyama et al., ervmap analysis reveals genome-wide transcription of human endogenous retroviruses. proc natl acad sci u s a , - ( ). . c. garbers et al., plasticity and cross-talk of interleukin -type cytokines. cytokine growth factor rev , - ( ). . s. kang, m. narazaki, h. metwally, t. kishimoto, historical overview of the interleukin- family cytokine. j exp med , ( ). . r. umeshita-suyama et al., characterization of il- and il- signals dependent on the human il- receptor alpha chain : redundancy of requirement of tyrosine residue for stat activation. int immunol , - ( ). . o. w. nadeau et al., the proximal tyrosines of the cytoplasmic domain of the beta chain of the type i interferon receptor are essential for signal transducer and activator of transcription (stat) activation. evidence that two stat sites are required to reach a threshold of interferon alpha-induced stat tyrosine phosphorylation that allows normal formation of interferon-stimulated gene factor . j biol chem , - ( ). . m. n. sharif et al., ifn-alpha priming results in a gain of proinflammatory function by il- : implications for systemic lupus erythematosus pathogenesis. j immunol , - ( ). . d. richter et al., ligand-induced type ii interleukin- receptor dimers are sustained by rapid re-association within plasma membrane microcompartments. nat commun , ( ). . j. p. twohig et al., activation of naive cd (+) t cells re-tunes stat signaling to deliver unique cytokine responses in memory cd (+) t cells. nat immunol , - ( ). . p. c. heinrich et al., principles of interleukin (il)- -type cytokine signalling and its regulation. biochem j , - ( ). . d. levin, d. harari, g. schreiber, stochastic receptor expression determines cell fate upon interferon treatment. mol cell biol , - ( ). . i. moraga, d. harari, g. schreiber, g. uze, s. pellegrini, receptor density is key to the alpha /beta interferon differential activities. mol cell biol , - ( ). . c. c. m. ho et al., decoupling the functional pleiotropy of stem cell factor by tuning c-kit signaling. cell , - e ( ). . p. charlot-rabiega, e. bardel, c. dietrich, r. kastelein, o. devergne, signaling events involved in interleukin (il- )-induced proliferation of human naive cd + t cells and b cells. j biol chem , - ( ). . j. diegelmann, t. olszak, b. goke, r. s. blumberg, s. brand, a novel role for interleukin- (il- ) as mediator of intestinal epithelial barrier protection mediated via differential signal transducer and activator of transcription (stat) protein .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / signaling and induction of antibacterial and anti-inflammatory proteins. journal of biological chemistry , - ( ). . h. bender et al., interleukin- displays interferon-gamma-like functions in human hepatoma cells and hepatocytes. hepatology , - ( ). . t. imamichi, j. yang, w. huang da, b. sherman, r. a. lempicki, interleukin- induces interferon-inducible genes: analysis of gene expression profiles using affymetrix microarray and david. methods mol biol , - ( ). . j. m. fakruddin et al., noninfectious papilloma virus-like particles inhibit hiv- replication: implications for immune control of hiv- infection by il- . blood , - ( ). . a. c. frank et al., interleukin- , an anti-hiv- cytokine, inhibits replication of hepatitis c virus. j interferon cytokine res , - ( ). . s. l. laporte et al., molecular and structural basis of cytokine receptor pleiotropy in the interleukin- / system. cell , - ( ). . j. b. spangler, i. moraga, k. m. jude, c. s. savvides, k. c. garcia, a strategy for the selection of monovalent antibodies that span protein dimer interfaces. j biol chem , - ( ). . a. kirchhofer et al., modulation of protein properties in living cells using nanobodies. nat struct mol biol , - ( ). . m. c. hochberg, updating the american college of rheumatology revised criteria for the classification of systemic lupus erythematosus. arthritis rheum , ( ). . j. cox, m. mann, maxquant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. nat biotechnol , - ( ). . j. cox et al., andromeda: a peptide search engine integrated into the maxquant environment. j proteome res , - ( ). . p. o. krutzik, g. p. nolan, fluorescent cell barcoding in flow cytometry allows high- throughput drug screening and signaling profiling. nat methods , - ( ). . w. huang da, b. t. sherman, r. a. lempicki, bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. nucleic acids res , - ( ). . w. huang da, b. t. sherman, r. a. lempicki, systematic and integrative analysis of large gene lists using david bioinformatics resources. nat protoc , - ( ). . n. kozer et al., exploring higher-order egfr oligomerisation and phosphorylation--a combined experimental and theoretical approach. mol biosyst , - ( ). . d. n. itzhak, s. tyanova, j. cox, g. h. borner, global, quantitative and dynamic mapping of protein subcellular localization. elife , ( ). . t. toni, d. welch, n. strelkowa, a. ipsen, m. p. stumpf, approximate bayesian computation scheme for parameter inference and model selection in dynamical systems. j r soc interface , - ( ). . j. vogelsang et al., a reducing and oxidizing system minimizes photobleaching and blinking of fluorescent dyes. angew chem int ed engl , - ( ). . a. kirchhofer et al., modulation of protein properties in living cells using nanobodies. nat struct mol biol , -u ( ). . a. serge, n. bertaux, h. rigneault, d. marguet, dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes. nat methods , - ( ). . c. you et al., receptor dimer stabilization by hierarchical plasma membrane microcompartments regulates cytokine signaling. sci adv , e ( ). . f. roder, a. lubk, d. wolf, t. niermann, noise estimation for off-axis electron holography. ultramicroscopy , - ( ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure legends: figure cytokine receptor activation by il- and (hyp)il- : a) cartoon model of stepwise assembly of the il- and hypil- -induced receptor complex and subsequent activation of stat and stat . b) dose-dependent phosphorylation of stat and stat as a response to il- and hypil- stimulation in th- cells, normalized to maximal il- stimulation. data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. c) phosphorylation kinetics of stat and stat followed after stimulation with saturating concentrations of il- ( nm) and hypil- ( nm) or unstimulated th- cells, normalized to maximal il- stimulation. data was obtained from five biological replicates with each two technical replicates, showing mean ± std dev. d) top: phosphorylation kinetics of stat and stat followed after stimulation with hypil- ( nm) or left unstimulated, comparing wt rpe and rpe gp ko reconstituted with high levels of mxfpm-gp (= x [gp ]). data was normalized to maximal stimulation levels of wt rpe . left: cell surface gp levels comparing rpe gp ko, wt rpe and rpe gp ko stably expressing mxfpm-gp measured by flow cytometry. data was obtained from one biological replicate with each two technical replicates, showing mean ± std dev. bottom right: cell surface levels of gp measured by flow cytometry for indicated cell lines. e) cartoon model of cell surface labeling of mxfp-tagged receptors by dye-conjugated anti-gfp nanobodies (nb) and identification of receptor dimers by single molecule dual-colour co-localization. f) raw data of dual-colour single-molecule tirf imaging of mxfpe-il- rαnb-rho and gp nb-dy after stimulation with il- . particles from the insets (il- ra: red & gp : blue) were followed by single molecule tracking ( frames ~ . s) and trajectories > steps ( ms) are displayed. receptor heterodimerization was detected by co-localization/co-tracking analysis. g) relative number of co-trajectories observed for heterodimerization of il- rα and gp as well as homodimerization of gp for unstimulated cells or after indicated cytokine stimulation. each data point represents the analysis from one cell with a minimum of cells measured for each condition. *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. h) stoichiometry of the il- –induced receptor complex revealed by bleaching analysis. left: intensity traces of mxfpe-il rαnb-rho and gp nb-dy were followed until fluorophore bleaching. middle: merged imaging raw data for selected timepoints. right: overlay of the trajectories for il- rα (red) and gp (blue). figure : mathematical modelling results in rpe and th- cells. a) simplified cartoon model of il- /hypil- signal propagation layers and coverage of the mathematical modelling approach. b) model selection results showing the relative probabilities of each hypothesis, for different values of the distance threshold, 𝛿∗, in rpe cells. c) pointwise median and % credible intervals of the predictions from the mathematical model, calibrated with the experimental data, using the posterior distributions for the parameters from the abc-smc. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / d) kernel density estimates of the posterior distributions for the parameters 𝑝 ∈ {𝑟#,. & ,𝑟#,. , ,𝑟",. & ,𝑟",. , ,𝑘$% & ,𝑘$% , ,𝑘$' & ,𝑘$' , ,𝑞,𝑑$,𝛽., [𝑅#( )],[𝑅"( )],[𝑆#( )],[𝑆(( )]} in the mathematical models where 𝑗 ∈ { , } and 𝑖 ∈ { , }. figure : il- rα cytoplasmic domain is required for sustained pstat kinetics. a) representation of the cytoplasmic domain of il- rα with its highlighted tyrosine residues y and y . b) stat and stat phosphorylation kinetics of rpe clones stably expressing wt and mutant il- rα after stimulation with il- ( nm, top panels) or after stimulation with hypil- ( nm, bottom panels), normalized to maximal levels of wt il- rα stimulated with il- (top) or hypil- (bottom). data was obtained from three experiments with each two technical replicates, showing mean ± std dev. bottom right: cell surface levels variants measured by flow cytometry for indicated il- rα cell lines. c) cytoplasmic domain of il- rα is required for sustained pstat activation. left: cartoon representation of receptor complexes. right: stat and stat phosphorylation kinetics of rpe clones stably expressing wt il- rα and il- rα- gp chimera after stimulation with il- ( nm, top panels) or after stimulation with hypil- ( nm, bottom panels). data was normalized to maximal levels for each cytokine and cell line. data was obtained from two experiments with each technical replicates, showing mean ± std dev. d) phosphatases do not account for differential pstat / activity induced by il- and hypil- . left: schematic representation of workflow using jak inhibitor tofacitinib. right: mfi ratio of tofacitinib-treated and non-treated rpe mxfpe-il- rα cells for pstat and pstat after stimulation with il- ( nm) and hypil- ( nm). data was obtained from two experiments with each two technical replicates, showing mean ± std dev. figure : unique and overlapping effects of il- and hypil- on the phosphoproteome of th- cells. a) volcano plot of the phospho-sites regulated (p value £ . , fold change ³+ . or £- . ) by il- (left) and hypil- (right). data was obtained from three biological replicates. b) heatmap representation (examples) of shared and differentially up- (left) and downregulated (right) phospho-sites after il- and hypil- stimulation. data represents the mean (log ) fold change of three biological replicates. c) tyrosine and serine phosphorylation of selected stat proteins after stimulation with il- (red) and hypil- (blue). *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. d) ps -stat and ps -stat phosphorylation kinetics in th- cells after stimulation with il- or hypil- , normalized to maximal il- stimulation. data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. e) go analysis “biological processes” of the phospho-sites regulated by il- (red) and hypil- (blue) represented as bubble-plots. f) phosphorylation of target proteins associated with stat /cdk transcription initiation complex after stimulation with il- (blue) and hypil- (red) and schematic representation of transcription regulation of rna polymerase ii with identified phospho-sites (red flags). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : kinetic decoupling of gene induction programs depends on sustained stat activation by il- . a) principal component analysis for genes found to be significantly upregulated (left) or downregulated (right) for at least one of the tested conditions (time & cytokine). data was obtained from three biological replicates. b) kinetics of gene induction shared between il- and hypil- (relative to il- ) for upregulated genes (red) or downregulated genes (green). c) kinetics of gene numbers induced after il- and hypil- stimulation for upregulated genes (left) and downregulated genes (right). d) gsea reactome analysis of selected pathways with significantly altered gene induction in response to il- or hypil- stimulation. data represents the mean (log ) fold change of three biological replicates. e) cluster analysis comparing the gene induction kinetics after il- or hypil- stimulation. gene induction heatmaps for example genes as well as induction kinetics (mean) are shown for highlighted gene clusters. data represents the mean (log ) fold change of three biological replicates. figure : il- -induced upregulation of irf amplifies induction of stat -dependent genes a) kinetics of irf protein expression as a response to continuous il- and hypil- stimulation in th- cells. data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. dotted line indicates baseline level. b) kinetics of irf protein expression and sirna-mediated irf knockdown in rpe il- rα cells stimulated with il- ( nm). data was obtained from one representative experiment with each two technical replicates, normalized to maximal irf induction ( h), showing mean ± std dev. c) kinetics of stat (left) and stat (right) phosphorylation after sirna-mediated irf knockdown in rpe il- rα cells stimulated with il- ( nm). data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. d) kinetics of gene induction (stat , gbp , oas , socs ) followed by rt qpcr in rpe il- rα cells stimulated with il- ( nm) with and without sirna-mediated knockdown of irf . data was obtained from three experiments with each two technical replicates, showing mean ± sem. figure : il- -induced stat response drives global proteomic changes in th- cells. a) workflow for quantitative silac proteomic analysis of th- cells continuously stimulated ( h) with il- ( nm), hypil- ( nm) or left untreated. b) global proteomic changes in th- cells induced by il- (left) or hypil- (right) represented as volcano plots. proteins significantly up- or downregulated are highlighted in red (p value £ . , fold change ³+ . or £- . ). significantly altered isg-encoded proteins by il- are highlighted in yellow. data was obtained from three biological replicates. c) venn diagrams comparing unique upregulated (left) and downregulated (right) proteins by il- (blue) and hypil- (red) as well as shared altered proteins. isg-encoded proteins are highlighted in yellow. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / d) heatmaps of the top up- and downregulated proteins by il- compared to hypil- . data representation of the mean (log ) fold change of three biological replicates. e) heatmap representation and enrichment plot of proteins identified by gsea reactome pathway enrichment analysis “cytokine signaling and immune system” induced by il- . data representation of the mean (log ) fold change of three biological replicates. f) correlation of il- and hypil- -induced rna-seq transcript levels (³+ or £- fc) with quantitative proteomic data (³+ . or £- . fc). data representation of the mean (log ) fold change of three biological replicates. figure : receptor and stat concentrations determine the nature of the cytokine response. a) copy numbers of indicated proteins determined for different t-cell subsets using mass- spectrometry based proteomics (immpres - http://immpres.co.uk). b) model predictions for varying levels of stat and stat (left panel) or il- rα and gp (right panel) for phosphorylation kinetics of stat and stat . c) gene expression profiles determined by rnaseq analysis comparing indicated genes of a cohort of sle risk patients with a cohort of healthy controls. data obtained from: proc natl acad sci u s a , - . *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. d) dose-dependent phosphorylation of stat and stat as a response to il- and hypil- stimulation in naive and ifnα -primed ( nm, h) th- cells, normalized to maximal il- stimulation (ctrl). data was obtained from four biological replicates with each two technical replicates, showing mean ± std dev. e) phosphorylation of stat (left) and stat (right) as a response to il- ( nm, min) and hypil- ( nm, min) stimulation in healthy control (ctrl) and sle patient cd + t-cells. data was obtained from five healthy control donors ( ) and six sle patients. *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. f) tofacitinib titration to inhibit stat and stat phosphorylation by hypil- ( nm, min) in th- cells (left) and rpe cells stably expressing wt il- rα (right). supp. figure : a) comparison of dose-dependent phosphorylation (stat / ) of purchased il- and mil- sc in activated cd + cells, normalized to maximal mfi levels. data was obtained from one (purchased) or two (mil- sc) biological replicates with each two technical replicates, showing mean ± std dev. b) schematic workflow of t-cell isolation, th differentiation, fluorescence barcoding and gating strategy for high throughput flow cytometry. c) phosphorylation kinetics of stat and stat followed after stimulation with il- ( nm) and hypil- ( nm) or unstimulated th cells. data (from fig. c) was normalized to maximal mfi levels for each cytokine. data was obtained from five biological replicates with each two technical replicates, showing mean ± std dev. d) phosphorylation kinetics of activated pbmcs (cd +, cd +) of stat and stat followed after stimulation with il- ( nm) and hypil- ( nm) or unstimulated cells. data was normalized to maximal il- stimulation. data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. e) dose-response experiments in wt rpe cells for pstat (left) and pstat (right), stimulated with il- or hypil- , normalized to maximal hypil- stimulation. data was .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / obtained from one representative experiment with each two technical replicates, showing mean ± std dev. supp. figure : a) dose-response experiments for pstat and pstat comparing rpe gp ko cells (left), wt rpe (middle) and rpe mxfpe-il ra (right) after stimulation with il- or hypil- , normalized to maximal hypil- stimulation. data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. b) ligand-induced receptor dimerization: top panel: dual-colour co-tracking of il- rα and gp in the absence (top) and presence (bottom) of il- ( nm). trajectories ( frames, ~ . s) of individual mxfpe-il rαnb-rho (red) and gp nb-dy (blue) and co-trajectories (magenta) are shown for a representative cell. bottom panel: dual-colour co-tracking of gp in the absence (top) and presence (bottom) of hypil- ( nm). trajectories ( frames, ~ . s) of individual mxfpe-il rαnb-rho (red) and gp nb-dy (blue) and co-trajectories (magenta) are shown for a representative cell. c) top: cartoon model of cell surface labeling of mxfp-tagged gp by dye-conjugated anti-gfp nanobodies (nb) and formation of single-colour homodimers (left) or dual- colour homodimers (right). below: examples for intensity traces of single-colour dual- step bleaching (left) or dual-colour single-step bleaching (right). insets show raw data for selected timepoints and corresponding trajectories. d) top: comparison of diffusion coefficients (d) for mxfpe-il- rαnb-rho (red) and mxfpmgp nb-dy (blue) in presence and absence of il- stimulation ( nm), as well as co-trajectories after il- stimulation (magenta). bottom: comparison of diffusion coefficients for mxfpm-gp nb-rho (red) in presence and absence of hypil- stimulation ( nm), as well as co-trajectories after hypil- stimulation (magenta). each data point represents the analysis from one cell with a minimum of cells measured for each condition. *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. supp. figure : a) reactions involving ligand binding and dimerization in the hypil- model. b) reactions involving ligand binding and dimerization in the il- model. c) reactions involving the stat molecules (𝑆. 𝑓𝑜𝑟 𝑗 ∈ { , }) in the hypil- model. d) reactions involving the stat molecules (𝑆. 𝑓𝑜𝑟 𝑗 ∈ { , }) in the il- model. e) reactions involving receptor internalisation/degradation in the hypil- model. here 𝐻 = 𝛽) and 𝐻 = 𝛾)([𝑝𝑆 ] + [𝑝𝑆 ]). f) reactions involving receptor internalisation/degradation in the il- model. here 𝐻 = 𝛽"* and 𝐻 = 𝛾"*([𝑝𝑆 ] + [𝑝𝑆 ]). g) dephosphorylation of (𝑆. 𝑓𝑜𝑟 𝑗 ∈ { , }) in the cytoplasm. this reaction occurs in both models. h) key for the molecules in the reactions. supp. figure : a) stat (left) and stat (right) phosphorylation kinetics of rpe clones stably expressing wt il- rα after stimulation with il- or after stimulation with hypil- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / normalized to maximal il- stimulation. data was obtained from three experiments with each two technical replicates, showing mean ± std dev. b) dose-response experiments for pstat (left) and pstat (right) in rpe cells stably expressing wt il- rα or tyrosine-mutants after stimulation with il- , normalized to maximal stimulation of wt il- rα. data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. supp. figure : a) dose-response experiments for pstat (left) and pstat (right) in rpe cells stably expressing wt il- rα or il- ra-gp chimera after stimulation with il- . data normalized to maximal stimulation of wt il- rα. data was obtained from one representative experiment with each two technical replicates, showing mean ± std dev. b) stat (left) and stat (right) phosphorylation kinetics in rpe il- rα cells stimulated with il- or hypil- with and without jak inhibition by tofacitinib. data was normalized to maximal il- stimulation. data was obtained from two experiments with each two technical replicates, showing mean ± std dev. c) stat (left) and stat (right) phosphorylation kinetics in th- cells stimulated with il- or hypil- with and without jak inhibition by tofacitinib. data was normalized to to maximal il- stimulation. data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. d) mfi ratio of tofacitinib-treated and non-treated th- cells for pstat (left) and pstat (right) after stimulation with il- ( nm) and hypil- ( nm). data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. supp. figure : a) stat (left) and stat (right) phosphorylation kinetics in rpe il- rα cells stimulated with il- or hypil- with and without pretreatment with cycloheximide (chx). data was normalized to to maximal il- stimulation. data was obtained from two experiments with each two technical replicates, showing mean ± std dev. b) stat (left) and stat (right) phosphorylation kinetics in th cells stimulated with il- or hypil- with and without pretreatment with cycloheximide (chx). data was normalized to to maximal il- stimulation. data was obtained from two biological replicates with each two technical replicates, showing mean ± std dev. supp. figure : a) workflow for quantitative silac phospho-proteomic analysis of th- cells stimulated ( min) with il- ( nm), hypil- ( nm) or left untreated. b) schematic representation of the main go terms regulated by il as inferred from our p-proteomics studies. red represents downregulated p-sites and blue represents upregulated p-sites upon il stimulation of human primary th- cells. c) schematic representation of the main go terms regulated by hyil as inferred from our p-proteomics studies. red represents downregulated p-sites and blue upregulated p-sites upon hyil stimulation of human primary th- cells. supp. figure : .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / a) venn diagrams comparing the numbers of unique upregulated (left) and downregulated (right) phospho-sites by il- (blue) and hypil- (red) as well as the number of shared phospho-sites. b) list of most strongly altered phosphosites (downregulated: green; upregulated: red) in response to il- (left) or hypil- (right). c) go analysis “cellular location” and “up keywords” of the phospho-sites regulated by il (red) and hypil- (blue) represented as bubble-plots. d) phosphorylation of target proteins related to treg functions and schematic representation of their activity on t-cells. supp. figure : a) kinetics of gene induction in th- cells induced by il- represented as volcano plots. genes significantly up- or downregulated are highlighted in red (p value £ . , fold change ³+ or £- ). data was obtained from three biological replicates. b) kinetics of gene induction in th- cells induced by hypil- represented as volcano plots. genes significantly up- or downregulated are highlighted in red (p value £ . , fold change ³+ or £- ). data was obtained from three biological replicates. c) kinetics of gene induction in th- cells induced by hypil- represented as volcano plots. genes identified to be significantly up- or downregulated by il- are highlighted in red (p value £ . , fold change ³+ or £- ). data was obtained from three biological replicates. supp. figure : a) gene induction kinetics represented as pie-charts, separated for upregulated genes (top panel) and downregulated genes (bottom panel). b) kinetics of isg induction (examples) as heatmap representation comparing il- with hypil- (top) and gsea reactome pathway enrichment “ifn signaling” for genes induced by il- after h (bottom). data represents the mean (log ) fold change of three biological replicates. c) heatmaps of the top up- and downregulated genes by il- compared to hypil- for h, h and h. data represents the mean (log ) fold change of three biological replicates. d) kinetics of irf protein expression as a response to continuous il- and hypil- stimulation in th- cells. data was obtained from three biological replicates with each two technical replicates, showing mean ± std dev. supp. figure : a) pie charts of proteomic changes (unique & shared) for upregulated (left) and downregulated (right) proteins in response to il- or hypil- stimulation in th- cells. b) left: gsea reactome pathway enrichment analysis “interferon signaling” for proteins induced by il- . middle: heatmap representation pathway-associated proteins comparing il- with hypil- stimulation. data represents the mean (log ) fold change of three biological replicates. right: localization of the identified proteins in context to the data distribution of il- -induced proteomic changes. pathway-associated proteins are highlighted for il- (blue) and hypil- (red) as well as non-significant data distribution (grey). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / c) left: gsea reactome pathway enrichment analysis “cytokine signaling and immune system” for proteins induced by il- . middle: heatmap representation pathway- associated proteins comparing il- with hypil- stimulation. data represents the mean (log ) fold change of three biological replicates. right: localization of the identified proteins in context to the data distribution of il- -induced proteomic changes. pathway-associated proteins are highlighted for il- (blue) and hypil- (red) as well as non-significant data distribution (grey). d) average intensity distribution of untreated proteomic data. top up- and downregulated proteins (≥ + x or ≤ - x change) altered by il- (left) or hypil- (right) stimulation are indicated. supp. figure : a) pointwise median and % credible intervals of the wt and chimera mathematical models, using the posterior distributions for the parameters from the abc-smc. b) dose response curve in rpe using the posterior distributions from the abc-smc and varying the concentrations of hypil- and il- in the model. c) pointwise median and % credible intervals of the wt mathematical model and simulations of a mutant model with 𝑘#' & = ,> nm- s- and 𝑘#' , = m s- , using the posterior distributions for the parameters from the abc-smc for the other parameters. supp. figure : a) fold induction of total stat and stat levels in th- measured by flow cytometry. data was obtained from two biological replicates. b) total levels of stat and stat measured in cd + by flow cytometry for healthy control (ctrl) and lupus patients (sle). data was obtained from five (ctrl) and six (sle) biological replicates. *p < . , **p ≤ . ,***p ≤ . ; n.s., not significant. c) ratio of pstat and pstat after il- ( min, nm) or hypil- ( min, nm) stimulation measured in cd + by flow cytometry for healthy control (ctrl) and lupus patients (sle). data was obtained from five (ctrl) and six (sle) biological replicates normalized to mean ratio of healthy control samples. d) tofacitinib titration to inhibit stat and stat phosphorylation by il- ( nm) in th- cells (left) and rpe cells stably expressing wt il- rα (right). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supp. movie : single-molecule co-tracking as a readout for dimerization of cytokine receptors. cell surface labelling of mxfpe-il- rα by enbrho (left, top) and mxfpm-gp by mnbdy (left, bottom) after stimulation with il- ( nm). in the overlay of the zoomed section of both spectral channels (mxfpe-il- rαrho : red, mxfpm-gp dy : blue), yellow lines indicate co-locomotion of il- rα and gp (≥ steps). acquisition frame rate: hz, playback: real time. supp. movie : dynamics of il- -induced receptor assembly. formation of a single-molecule heterodimer of mxfpe-il- rαrho (red) and mxfpm-gp dy (blue) in presence of il- . yellow lines indicate co-locomotion of il- rα and gp (≥ steps). acquisition frame rate: hz, playback: real time with break at time of receptor dimerization. supp. movie : ligand-induced heterodimerization of il- rα and gp . overlay of the two spectral channels (mxfpe-il- rαrho : red, mxfpm-gp dy : blue) in absence (left) or presence (right) of il- ( nm). yellow lines indicate co-locomotion of il- rα and gp (≥ steps). acquisition frame rate: hz, playback: real time. supp. movie : ligand-induced homodimerization of gp . overlay of the two spectral channels (mxfpm- gp rho : red, mxfpm-gp dy : blue) in absence (left) or presence (right) of hypil- ( nm). yellow lines indicate co-locomotion of il- rα and gp (≥ steps). acquisition frame rate: hz, playback: real time. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . . . . . . . . . . . . unstim. il- hypil- . . . . . . . unstim. il- hypil- fig. il- rα p ebi il- jak jak gp hypil- il- il- rα(ecd) pstat / a) b) e) time / min time / min ps ta t / re l. m fi ps ta t / re l. m fi pstat pstat 𝚫 𝚫 𝚫 𝚫 𝚫 - - - - . . . . . . . il- hypil- - - - - . . . . . . . c / log nmc / log nm ps ta t / re l. m fi ps ta t / re l. m fi pstat pstat 𝚫 c) µm gp il- il- rα gp co-localization enbrho mnbdy il- rα r el . c o- lo co m ot io n in te ns ity . / a .u . il- rα gp time / s il- rα gp dimers f) s . s . s . s nmil- rα gp rho bleached 𝚫fret rho bleached dy bleached g) h) d) time / mintime / min ps ta t / re l. m fi ps ta t / re l. m fi pstat pstat . . . . . . . heterodimerization il- rα + gp +hypil- +il- homodimerization gp + gp *** *** . . . . . . . wt [gp ] unstim. x [gp ] unstim. wt [gp ] + hypil- x [gp ] + hypil- . . . . . . . co un t receptor expression gp ko wt [gp ] x [gp ] a) fig. . receptor assembly . proteome changes . gene induction il- il- rα gp pstat / stat / . stat activation mathematical modelling ps ta t / re l. m fi ps ta t / re l. m fi time / min time / min 𝜹∗ n o. a cc ep te d pa ra m et er s c) b) d) . . . . . . . unstim. wt y f y f y f-y f . . . . . . . . . . . . . . . . . . . . . . . . . . . . unstim. wt chimera . . . . . . . . . . . . . . unstim. wt chimera . . . . . . . il- rα cytoplasmic domain y y tsgrcyhlrhkvlprwvwekvpdpansssgqphmeqvpeaqplgdlpileveemepppvmess qpaqatapldsgyekhflptpeelgllgpprpqvla* fig. min min min min min min min min +t of ac iti ni b unstim. +il- +hypil- time / min ps ta t / re l. m fi ps ta t / re l. m fi time / min - % pstat - % pstat b) a) d) . . . . . . . . il- hypil- time / min r at io p s ta t + /- to f. +tofacitinib . . . . . . . . il- hypil- time / min r at io p s ta t + /- to f. +tofacitinib il- rα gp +il- il- rα-gp gp +il- gp gp +hypil- ps ta t / re l. m fi time / min hypil- pstat ps ta t / re l. m fi time / min il- pstat 𝚫 𝚫 𝚫 𝚫 il- pstat hypil- pstat ps ta t / re l. m fi ps ta t / re l. m fi time / min time / min c) time / min ps ta t / re l. m fi ps ta t / re l. m fi time / min hypil- pstat il- pstat il- pstat hypil- pstat pstat pstat co un t receptor expression ctrl wt y f y f y f- y f jak jak ne lfa s pp m g t rc hy s la rp s po lr a s po lr a s po lr a s fig. - - - - fold change / log p v al u e / - lg unchanged downregulated upregulated - - - - fold change / log p v al u e / - lg unchanged downregulated upregulated map b chd scaf wrnip bola bad stat stat ubr stat map b chd scaf wrnip bola rchy nelfa stat stat ppm g b) a) il- hypil- c)shared and differentially regulated p-sites lgalsl (s) bad (s) stat (y) stat (y) stat (y) stat a,b (y) ptpn (y) ppm g (t) sugp (s) card (s) stat (s) rnase (s, t) ahnak (s) clk (s) ahnak (t) bad (s) arl ip (s) ubr (s) piezo (s) reps (s) srrm (s) ankrd c (t) cdca l (s) nelfa (s) ndrg (s) prr (s) rchy (s) osbpl (s) znf (s) rps ka (s) > cdh (s) map b (s) znf c (s,t) adgrf (t,y) zc hc a (s) bola (s) gtf i (s) tacc (s, y) scaf (s) abcc (s) wrnip (s) sec ip (s) osbpl (s) stau (s) lrrfip (s) top b (s) zcrb (s) rfx (s) pabpn (s) arhgdia (s) fam e (t,y) nudt (s) hnrnpf (s) tpr (s) taldo (s) pcnx (s) klc (s) rbm (s) irs (s) pml (s) - - - - < - il- hy pil - fc / lo g il- hy pil - fc / lo g fo ld c ha ng e p tef b sk snrnp larp ppm g rna pol- nelfacy clin t cdk stat p rchy cyclin c cdk mediator complex f) . . . . . il- hypil- time / min . . . . . . . il- hypil- time / min ps -s ta t r el . m fi e) fo ld c ha ng e stat y stat y stat y stat y stat s stat s tyrosine-p serine-p il- hypil- * * * ** *** ** *** il- hypil- ps -s ta t r el . m fi mr na p ro ce ss ing mr na s pli cin g mr na ex po rt ja k/ st at ca sc ad e ce ll-c ell ad he sio n tr an sc rip tio n po sit ive r na po l ii re gu lat ion ne ga tiv e r na po l ii re gu lat ion nu cle ar po re co mp lex as se mb ly re gu lat ion r ho si gn ali ng hi sto ne h -k t rim eth yla tio n dn a me th yla tio n re gu lat ion r na po l ii d) fos socs cd ifng egr nfkbia klf jun osm rhob il - - - - - il- hypil- - - il- hypil- gbp gbp gbp gbp ifi il rb il irf irf jak mx oas parp stat stat trafd trim trim ube l usp cd ifit ifit ifit ifit irf rgs socs - h h h h h h il- hypil- h h h h h h interferon signature stat dependent genes stat dependent genes - - il- hypil- fo ld c ha ng e / l og fo ld c ha ng e / l og h h h h h h h h il- hypil- fc / log fc / log fc / log il- hypil- il- hypil- time / h h h h h fig. z x - - - - - y il- hypil- h h h h h h y x - - - - - - - z h h h h h h . . . . . . upregulated genes downregulated genes upregulated genes downregulated genesa) time / h fr ac tio n sh ar ed w ith il - b) e) time / h fo ld c ha ng e / l og time / h il- hypil- il- hypil- ge ne s ge ne s time / h time / h upregulated downregulatedc) d) interferon signaling immune system interferon alpha/beta signaling interferon gamma signaling cytokine signaling in immune system h h h h fc / log il- hypil- h h fo ld c ha ng e / l og fig. . . . . . . . control sirna irf sirna ir f /r el . m fi time / h irf protein levels control sirna irf sirna gapdh sirna control sirna fo ld in du ct io n time / h fo ld in du ct io n time / h stat oas control sirna irf sirna control sirna irf sirna fo ld in du ct io n time / h fo ld in du ct io n time / h gbp socs b) c) irf protein levels ir f / m fi time / h a) control sirna irf sirna untransfected ps ta t / m fi time / h pstat control sirna irf sirna untransfected ps ta t / m fi time / h pstat d) il- hypil- - - - - - - - - - - differentiate to th in silac media light (r k ) medium (r k ) high (r k ) stimulation hisolate pbmcs from buffy coat & cd + isolation mix : cell numbers fractionation lc-ms/ms maxquant peptide quantification lyse reduce alkylate digest unstim. il- hypil- il- hypil- mx stat stat ifitm gbp gbp vps tgfb isg ube l unchanged changed isgs upregulated proteins il- hypil- downregulated proteins il- hypil- in du ct io n tgfb smarcd vps rala selplg drg atp b prkar a larp abcb tceal mapk hla-c rap c fam a suz bcat arid b arf mien mettl uvrag pip k a zmym nb cox isy eif c b m hbs l dnajc tmed itga mllt acsl foxo atg b ppp r slc b rnf dnajc rbm cul b casp ppp r rock mcm dennd c ndufa tmed sde kpna jak arhgap coa snx limd selk rnf cndp erbb ip pmpca hla-e srcap sec b anapc btaf ccdc rpl myh il r tubb rtn lancl aars qtrtd scpep ccdc hist h a kti gtf c rpap nudt l otulin acot gstm hist h e p rx myadm abcb pld gtf b npepps naa cbx mt-co luc l tp bp gdi sptbn ywhag rbm hla-dqb kdm a qars pcbp ehd yif b dnase lig gbf nudt rpl btn a txnrd lmnb tbc d b exosc ndufa ncbp mcm ap mipep cbx hmha csnk b tbc d b bop mlst snapin gbp ube l gbp stat trafd parp stat parp ddx mx isg gbp nmi bst nub ifi xrn lgals bp lap trank trim nt c a plscr dnaja gbp oas ifitm pml tympalox ap ppp r acadm prkcsh zcchc srpk mecp hmgn eif e psmb e nr ic hm en t s co re r an ke d lis t m et ri c rank in ordered dataset gsea pathway reactome: cytokine signaling and immune system il- hypil- tgfb gbp rala ube l gbp stat stat mx isg gbp mapk ifitm hla-c fig. a) b) d) c) e) gbp ube l gbp stat trafd parp stat parp mx gbp ddx ifi xrn lgals bp trim gbp h h h h h h h h fc/ log tra ns cr ipt pr ot ein tra ns cr ipt pr ot ein il- hypil- f) fc/ log fc / lo g ( / ) ( / ) ( / )( / ) ( / ) ( / ) isgs dennd c dnajc tgfb smarcd ndufa vps gbp rala rbm ube l selplg gbp stat trafd prkar a parp stat parp larp abcb tceal mx isg cul b drg gbp casp mapk atp b ddx ppp r bop tp bp ccdc alox ap tbc d b csnk b scpep hmha snapin cbx luc l qtrtd mlst mt-co nudt gbf aars lig btaf dnase yif b ehd lancl cbx pcbp mipep mcm ap qars ncbp - - - - - > il - hy pi l- ncbp dennd c dnaj c fold change / log fold change / log p va lu e / - lo g p va lu e / - lo g fig. ps ta t (n or m al iz ed ) c / log μm f) co py n um be rs n ai ve c d n ai ve c d t h t h t h c t l n k m as t b m d m e o si n o p h il n ai ve c d n ai ve c d t h t h t h c t l n k m as t b m d m e o si n o p h il n ai ve c d n ai ve c d t h t h t h c t l n k m as t b m d m e o si n o p h il n ai ve c d n ai ve c d t h t h t h c t l n k m as t b m d m e o si n o p h il n ai ve c d n ai ve c d t h t h t h c t l n k m as t b m d m e o si n o p h il gp il- rα il- rα stat stat - - - . . . . . . . pstat pstat - - - . . . . . . . pstat pstat ps ta t (n or m al iz ed ) c / log μm th- rpe e) b) a) unstim. ctrl unstim. sle il- ctrl il- sle hypil- ctrl hypil- sleps ta t / m fi ps ta t / m fi pstat n.s. ** ** n.s. *** ** pstat ps ta t / re l. m fi c / log nm ps ta t / re l. m fi c / log nm d) - - - - . . . . . . . . . . il- il- primed hypil- hypil- primed - - - - . . . . . . . . . . il- il- primed hypil- hypil- primed pstat pstat time / min time / min time / min time / min ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi ps ta t / r el . m fi il- rα gp il- rα r p k m r p k m n.s. n.s.n.s. stat stat **** sle dis. risk healthy control c) supp. fig. - - - - . . . . . . . il- (miltenyi) mil- sc - - - - . . . . . . . il- (miltenyi) mil- sc il- / log nm ps ta t / re l. m fi pstat il- / log nm ps ta t / re l. m fi pstat time / min ps ta t / re l. m fi pstat time / min ps ta t / re l. m fi pstat time / min ps ta t / re l. m fi pstat time / min ps ta t / re l. m fi pstat cd + cd + b) d) . . . . . . . unstim. il- hypil- time / min ps ta t / re l. m fi pstat . . . . . . . unstim. il- hypil- time / min ps ta t / re l. m fi pstat 𝚫 𝚫 𝚫 c) dose-response or kinetic exp. ii) stimulation & sample barcoding iii) merge cells & ab staining leukocytes cd + cd + cd + leukocytes cd + cd -/cd + barcodeall data iv) flow cytometryi) pbmc isolation and th differentiation a) ps ta t / r el . m fi c / log nm ps ta t / r el . m fi c / log nm e) - - - . . . . . . . rpe + il- rpe + hypil- - - - . . . . . . . rpe + il- rpe + hypil- pstat pstat . . . . . . . unstim. il- hypil- . . . . . . . unstim. il- hypil- . . . . . . . unstim. il- hypil- . . . . . . . unstim. il- hypil- heterodimerization il- rα gp trajectories rho trajectories dy co-trajectories homodimerization gp gp unstim. +il- unstim. +hypil- µm c) . . . . . . . . . . nm nm fl uo re sc en ce in t. / a .u . time / s fl uo re sc en ce in t. / a .u . time / s dual-color dimersingle-color dimer single-color dual-step bleaching dual-color single-step bleaching labels label 𝚫fret dy bleached label bleached label bleached rho bleached hypil- . s . s . s . s . s . s . s . s . . . . . . . . . . . . . . d / µm s - gp il- rα dimer +il- +il- +il- d / µm s - gp dimer +hypil- d) +hypil- ** n.s. *** *** *** supp. fig. b) - - - - . . . . . . . - - - - . . . . . . . 𝚫gp 𝚫il- rα +gp 𝚫il- rα +gp +il- rα - - - - . . . . . . . il- pstat il- pstat hypil- pstat hypil pstat c / log nm ps ta t / r el . m fi c / log nm ps ta t / r el . m fi c / log nm ps ta t / r el . m fi a) a) b) c) d) e) f) g) h) supp. fig. b) il- / log nm ps ta t / re l. m fi il- / log nm ps ta t / re l. m fi - - - - . . . . . . . - - - - . . . . . . . - wt y f y f y f-y f 𝚫y f 𝚫y f . . . . . . . . unstim. il- hypil- ps ta t / re l. m fi ps ta t / re l. m fi time / min time / min 𝚫 𝚫 𝚫 𝚫 a) . . . . . . . . unstim. il- hypil- pstat pstat pstat pstat supp. fig. th cells (ratio +/- tofacitinib) . . . . . . . . il- hypil- . . . . . . . . il- hypil- time / min r at io p s ta t + /- to f. +tofacitinib +tofacitinib r at io p s ta t + /- to f. time / min d) - - - - . . . . . . . . . il- rα(wt) il- rα-gp ps ta t / r el . m fi il- / log nm a) - - - - . . . . . . . . . il- rα(wt) il- rα-gp ps ta t / r el . m fi il- / log nm c) . . . . . . . il- hypil- il- + tof. hypil- + tof. . . . . . . . il- hypil- il- + tof. hypil- + tof. time / min ps ta t / re l. m fi rpe il- rα cells th cells time / min ps ta t / re l. m fi b) +tofac. +tofac. . . . . . . . il- hypil- il- + tof. hypil- + tof. . . . . . . . il- hypil- il- + tof. hypil- + tof. time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi +tofac. +tofac. supp. fig. supp. fig. . . . . . . . . il- hypil- il- + chx hypil- + chx . . . . . . . . il- hypil- il- + chx hypil- + chx . . . . . . . il- hypil- il- + chx hypil- + chx . . . . . . . il- hypil- il- + chx hypil- + chx b) time / min ps ta t / re l. m fi rpe il- rα cells th cells time / min ps ta t / re l. m fi a) time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi il- gp il- rα p-s pias p-y s stat p-y s stat p-y stat p-y stat a p-y stat b jak/stat cascade cell-cell adhesion p-t s ahnak p-s ppfibp p-s pak p-y s stat p-s lima p-s s lrrfip p-s s micall p-s add p-s s aldoa p-t eif g p-s sept p-s snx p-s tmpo actin cytoskeleton p-t s ahnak p-s lima p-s s aldoa p-s sept p-s cd ap p-s fyb p-s cfl pre-autophagosomal structures p-t nbr p-s atg a p-s s sqstm regulation of rna pol ii negative regulation of rna pol ii p-s etv p-s hist h c p-s hist h d p-s hist h b p-s t smarca p-s rfx p-s dnmt a p-s sap p-s pias p-y s stat p-y s stat p-s s sqstm p-s s s spen p-s t znf c p-s spen aaa mrna processing p-s arl ip p-s rbm b p-s phrf p-s s scaf p-s sugp p-t acin p-t adar p-s ccar p-s mettl p-s s srrm mrna splicing p-s ncbp p-s rbm b p-s srrm p-s alyref p-s spen p-s s s polr a p-s hnrnpup-s mettl p-s s srrm p-s pabpn p-s srrm p-s s s spen mrna nuclear export p-s alyref p-s nup p-s s srrm p-s ncbp p-s nup p-s nup histone h -k methylation p-s hist h d p-s kmt a p-s hist h c dna methylation p-s baz a p-s kmt a p-s dnmt a transcription p-s dennd ap-t bclaf p-s s lrrfip p-s mrgbp p-s mysm p-s nfkbib p-s paxbp p-s pou f p-s rbm b p-s t smarca p-s baz b p-s baz a p-s ccar p-s chaf b p-s chd p-s gtf c p-s gon l p-s msl p-s naca p-s pphln p-s s ptmap-s rfx p-s rps p-s s s spen p-s tfdp p-s mga p-s phf p-s phf p-s rbl p-s sap bp p-s sap p-s itgb bp p-s pias p-y s stat p-y s stat p-y stat p-y stat a p-y stat b p-s spen p-s t znf c p-s znf p-s znf p-s znf p-y stat p-y stat p-y s stat p-y stat p-y stat a p-y stat b jak/stat cascade cell-cell adhesion p-s ndrg p-s ahnak p-y stat p-t ahnak p-s anxa p-s s snx p-s micall p-s t sept p-s lrrfip p-ss clint p-s tmpo golgi apparatus hypil- gp actin filament p-s akap p-y hck p-s s s akap p-s fkbp p-s myo b p-y hck p-s lrba p-y lyn p-s pask p-s rab fip p-s raf p-s wdr p-s clint p-s pphln p-s slc a p-t arhgef p-s arfgap p-s htt p-s osbpl p-s zdhhc regulation of rna pol ii p-s rbl p-s mrgbp p-s s lrrfip p-s rbbp p-s t smarca p-s gtf i p-s rfx p-s tfdp p-s nfatc p-y s stat p-y stat a p-y stat b positive regulation of rna pol ii p-s nelfa p-s s nucks p-s raf p-s sqstm p-s trim p-s thrap p-s pml p-s safbp-s nfatc p-s ncoa p-s rps ka p-s ybx p-s pknox p-s tp bp p-s arhgef aaa mrna processing p-s tfip p-s ccar p-s casc p-s s scaf p-s sugp p-s rbm p-s rbbp p-s rbm b p-s xrn p-s srrm mrna splicing p-s tfip p-s hnrnpf p-s casc p-s s spen p-s cdc p-s rnpc p-s srsf p-s srsf p-s srrm p-s pabpn p-s hnrnpd p-s ybx mrna nuclear export p-s nup p-s pom p-s srrm p-s cdc p-s srsf p-s casc transcription p-s dennd a p-s gatad bp-t bclaf p-s pml p-s rbm b p-s rbm p-s baz b p-s ccar p-s gtf c p-s hnrnpd p-s ncor p-s pphln p-s tp bp p-s s spen p-s t znf c p-s znf p-s znf p-s lrrfip p-s mga p-s phf p-s mier p-y stat p-s znf p-s cdca l p-s itgb bp p-s ncoa p-y stat p-y s stat p-y stat p-y stat a p-y stat b p-s actl a p-s nfkbib rho signaling p-s raf p-s s s akap p-s arhgdia p-s myo b p-t arhgef p-s akap p-s rbbp p-y stat p-s gtf i p-s lrrfip p-s s nucks p-s arid a p-s nfatc p-s actl a p-y stat b p-y s stat p-y stat a p-s safb p-y s stat p-y stat p-y stat p-y stat a p-y stat b p-y stat p-s thrap p-s srsf p-s srsf p-s tpr nuclear pore assembly p-s tpr p-s ahctf p-s nup p-s arid a p-s safb differentiate to th- in silac media light (r k ) medium (r k ) high (r k ) stimulation: min isolate pbmcs from buffy coat & cd + isolation mix : cell numbers fractionation lc-ms/ms maxquant peptide quantification lyse reduce alkylate digest unstim. il- hypil- phosphopeptide enrichment (tio ) a) b) c) supp. fig. nucleus membrane cytoplasm pre-autophagosomal struct. actin cytoskeleton actin filament golgi apparatus il- hypil- nucleus methylation cytoplasm transcription mrna processing chromatin regulator mrna transport actin cytoskeleton actin filament golgi apparatus golgi apparatus il- hypil- cellular location up keywords peptide fold change / log peptide fold change / log chd s - . lgalsl s . map b s - . rnase s t . znf c s t - . ahnak s t . adgrf t y - . bad s . zc hc a s - . clk s . bola s - . stat y . gtf i s - . dcp b s . tacc s y - . stat y . scaf s - . stat y . abcc s - . stat a/b y /y . wrnip s - . ptpn y . sec ip s - . bad s . rbm b s - . arl ip s . mecp s - . ubr s . psmd s - . piezo s . ospbl s - . ppm g t . peptide fold change / log peptide fold change / log tacc s y - . lgalsl s . cdh s - . stat y . map b s - . myo b s . znf c s t - . ankrd c t . adgfr t y - . cdca l s . zc hc a s - . stat y . bola s - . nelfa s . wrnip s - . ppm g t . fam e t y - . bad s . scaf s - . ndrg s . abcc s - . stat y . nudt s - . sugp s . gtf i s - . prr s . zc h s - . stat s . sec ip s - . ptpn y . psmd s - . rchy s . b) c) d) il- hypil- ubr s bad s pak s * il- hypil- downregulated phospho-sites upregulated phospho-sites il- hypil- th treg p-ubr p-pak p-bad a) fo ld c ha ng e supp. fig. a) b) c) - - - - - - fold induction / log p v al u e / - lg unchanged regulated h h h - - - - - - fold induction / log p v al u e / - lg unchanged regulated - - - - - - fold induction / log p v al u e / - lg unchanged regulated - - - - - - fold induction / log p v al u e / - lg unchanged regulated il- h h h - - - - - - fold induction / log p v al u e / - lg unchanged regulated - - - - - - fold induction / log p v al u e / - lg unchanged regulated h h h - - - - - - fold induction / log p v al u e / - lg - - - - - - fold induction / log p v al u e / - lg - - - - - - fold induction / log p v al u e / - lg hypil- hypil- (il- regulated genes highlighted) supp. fig. il- top up & downregulated genes fosb rgs ifit fos ifit c orf socs socs cd nfkbiz ptchd p prr rgs cmpk c orf pmaip dusp ccl ifng egr sgk ifit cfl grm klf nfkbia dnajb klf jun znf bcdin d plekhf zkscan senp tnfsf alg l hist h j b galt pars ajuba kbtbd efna id dusp trgv p igip adrb znf zswim sowahd hsa-mir- a gusbp cebpe cdk r arl d nuak nog sertad zfp l ddit - ifit ctsl ifi l rgs rsad gbp p slc a slamf lamp etv chac gbp fam b gtf ird gbp lrrc gbp sema g ptchd p cetp socs slc a stat cmpk wars hapln smtnl bcl l ifit epsti gas l rassf igfbp hbegf adora cgn fgf tnfrsf d p ha ddit nek tmem nptx mt dp dusp p ha il matn pde b hspg cd ak dtx ppfia cfd dhdh egr fos pfkfb mir hg - - - - - ifi l c orf gbp p ifi spag ifit ifit rsad slamf fcrl gbp rgs gbp etv lamp usp stat cmpk nfix rufy cetp gbp ifit wars alg -as ifi lrrn frmd tnfsf b bcl l map cdc ep itgax hspg aicda hist h bo apba vldlr c orf rimkla sdk atoh kiss r hist h bl dtx emp wnt ccdc b ak oscp pfkfb stc s a spon egr fos vegfa adora mir hg ppfia - - - - - - il - hy pi l- il - hy pi l- il - hy pi l- total= il- hypil- shared total= il- hypil- shared total= il- hypil- shared total= il- hypil- shared total= il- hypil- shared total= il- hypil- shared upregulated genes downregulated genes time h h h il- hypil- interferon stimulated genes (isgs) h h h h h h gbp gbp gbp ifit ifit ifit ifng irf irf irf mx oas parp rgs socs socs stat stat usp - a) b) c) h h h gsea pathway enrichment: ifn signalling rank in ordered dataset en ric hm en t sc or e . . lis t m et ric - upregulated genes downregulated genes fc / lo g fc / lo g fc / lo g fc / lo g supp. fig. gsea pathway reactome: interferon signalling - protein id fo ld c h an g e / l o g data distribution il- hypil- e nr ic hm en t s co re r an ke d lis t m et ri c il- hypil- gbp ube l gbp stat stat mx isg gbp ifitm hla-c bst ifi trim b m oas . . . fc/ log a) b) c) e nr ic hm en t s co re r an ke d lis t m et ri c rank in ordered dataset gsea pathway reactome: cytokine signalling and immune system il- hypil- tgfb gbp rala ube l gbp stat stat mx isg gbp mapk ifitm hla-c - protein id fo ld c h an g e / l o g data distribution il- hypil- upregulated proteins downregulated proteins total= . % il- . % hypil- . % shared total= . % il- . % hypil- . % shared fc/ log supp. fig. rank in ordered dataset a) b) c) supp. fig. time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi time / min ps ta t / r el . m fi c / log nm ps ta t / re l. m fi time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi time / min ps ta t / re l. m fi time / min ps ta t / r el . m fi ps ta t (n or m al iz ed ) c / log μm ps ta t (n or m al iz ed ) c / log μm - - - . . . . . . . pstat pstat - - - . . . . . . . pstat pstat th- rpe tofacitinib titration – il- signaling supp. fig. a) d) . . . . . . stat stat fo ld in du ct io n time / h ctrl sle ctrl sle s ta t / m fi s ta t / m fi total stat total stat b) p: . p: . . . . . . . . il- ctrl il- sle hypil- ctrl hypil- sle ra tio p s ta t /p s ta t p: . p: . c) engineering the thermotolerant industrial yeast kluyveromyces marxianus for anaerobic growth wijbrand j. c. dekker, raúl a. ortiz-merino, astrid kaljouw, julius battjes, frank w. wiering, christiaan mooiman, pilar de la torre, and jack t. pronk* department of biotechnology, delft university of technology, van der maasweg , hz delft, the netherlands *corresponding author: department of biotechnology, delft university of technology, van der maasweg , hz delft, the netherlands, e-mail: j.t.pronk@tudelft.nl, tel: + . wijbrand j.c. dekker w.j.c.dekker@tudelft.nl raúl a. ortiz-merino raul.ortiz@tudelft.nl https://orcid.org/ - - - astrid kaljouw astridk @gmail.com julius battjes juliusbattjes@hotmail.com frank willem wiering frank.wiering@gmail.com christiaan mooiman c.mooiman@tudelft.nl pilar de la torre pilartocortes@gmail.com jack t. pronk j.t.pronk@tudelft.nl https://orcid.org/ - - - manuscript for submission in nature biotechnology, section: article. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:j.t.pronk@tudelft.nl mailto:w.j.c.dekker@tudelft.nl mailto:raul.ortiz@tudelft.nl https://orcid.org/ - - - mailto:c.mooiman@tudelft.nl mailto:j.t.pronk@tudelft.nl https://orcid.org/ - - - https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract current large-scale, anaerobic industrial processes for ethanol production from renewable carbohydrates predominantly rely on the mesophilic yeast saccharomyces cerevisiae. use of thermotolerant, facultatively fermentative yeasts such as kluyveromyces marxianus could confer significant economic benefits. however, in contrast to s. cerevisiae, these yeasts cannot grow in the absence of oxygen. response of k. marxianus and s. cerevisiae to different oxygen-limitation regimes were analyzed in chemostats. genome and transcriptome analysis, physiological responses to sterol supplementation and sterol-uptake measurements identified absence of a functional sterol-uptake mechanism as a key factor underlying the oxygen requirement of k. marxianus. heterologous expression of a squalene-tetrahymanol cyclase enabled oxygen-independent synthesis of the sterol surrogate tetrahymanol in k. marxianus. after a brief adaptation under oxygen-limited conditions, tetrahymanol- expressing k. marxianus strains grew anaerobically on glucose at temperatures of up to °c. these results open up new directions in the development of thermotolerant yeast strains for anaerobic industrial applications. keywords: ergosterol, tetrahymanol, anaerobic metabolism, thermotolerance, ethanol production, yeast biotechnology, metabolic engineering .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / in terms of product volume ( mton y- ) , , anaerobic conversion of carbohydrates into ethanol by the yeast saccharomyces cerevisiae is the single largest process in industrial biotechnology. for fermentation products such as ethanol, anaerobic process conditions are required to maximize product yields and to minimize both cooling costs and complexity of bioreactors . while s. cerevisiae is applied in many large-scale processes and is readily accessible to modern genome-editing techniques , , several non-saccharomyces yeasts have traits that are attractive for industrial application. in particular, the high maximum growth temperature of thermotolerant yeasts, such as kluyveromyces marxianus (up to °c as opposed to °c for s. cerevisiae), could enable lower cooling costs – . moreover, it could reduce the required dosage of fungal polysaccharide hydrolases during simultaneous saccharification and fermentation (ssf) processes , . however, as yet unidentified oxygen requirements hamper implementation of k. marxianus in large-scale anaerobic processes – . in s. cerevisiae, fast anaerobic growth on synthetic media requires supplementation with a source of unsaturated fatty acids (ufa), sterols, as well as several vitamins – . these nutritional requirements reflect well-characterized, oxygen-dependent biosynthetic reactions. ufa synthesis involves the oxygen- dependent acyl-coa desaturase ole , nad+ synthesis depends on the oxygenases bna , bna , and bna , while synthesis of ergosterol, the main yeast sterol, even requires moles of oxygen per mole. oxygen-dependent reactions in nad+ synthesis can be bypassed by nutritional supplementation of nicotinic acid, which is a standard ingredient of synthetic media for cultivation of s. cerevisiae , . ergosterol and the ufa source tween (polyethoxylated sorbitan oleate) are routinely included in media for anaerobic cultivation as ‘anaerobic growth factors’ (agf) , , . under anaerobic conditions, s. cerevisiae imports exogenous sterols via the abc transporters aus and pdr . mechanisms for uptake and hydrolysis of tween by s. cerevisiae are unknown but, after its release, oleate is activated by the acyl-coa synthetases faa and faa , . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / outside the whole-genome duplicated (wgd) clade of saccharomycotina yeasts, only few yeasts (including candida albicans and brettanomyces bruxellensis) are capable of anaerobic growth in synthetic media supplemented with vitamins, ergosterol and tween , , , . however, most currently known yeast species readily ferment glucose to ethanol and carbon dioxide when exposed to oxygen- limited growth conditions , , , indicating that they do not depend on respiration for energy conservation. the inability of the large majority of facultatively fermentative yeast species to grow under strictly anaerobic conditions is therefore commonly attributed to incompletely understood oxygen requirements for biosynthetic processes . several oxygen-requiring processes have been proposed including involvement of a respiration-coupled dihydroorotate dehydrogenase in pyrimidine biosynthesis, limitations in uptake and/or metabolism of anaerobic growth factors, and redox-cofactor balancing constraints , , . quantitation, identification and elimination of oxygen requirements in non-saccharomyces yeasts is hampered by the very small amounts of oxygen required for non-dissimilatory purposes. for example, preventing entry of the small amounts of oxygen required for sterol and ufa synthesis in laboratory- scale bioreactor cultures of s. cerevisiae requires extreme measures, such as sparging with ultra-pure nitrogen gas and use of tubing and seals that are resistant to oxygen diffusion , . this technical challenge contributes to conflicting reports on the ability of non-saccharomyces yeasts to grow anaerobically, as exemplified by studies on the thermotolerant yeast k. marxianus – . paradoxically, the same small oxygen requirements can represent a real challenge in large-scale bioreactors, in which oxygen availability is limited by low surface-to-volume ratios and vigorous carbon-dioxide production. identification of the non-dissimilatory oxygen requirements of non-conventional yeast species is required to eliminate a key bottleneck for their application in industrial anaerobic processes and, on a fundamental level, can shed light on the roles of oxygen in eukaryotic metabolism. the goal of this study .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / was to identify and eliminate the non-dissimilatory oxygen requirements of the facultatively fermentative, thermotolerant yeast k. marxianus. to this end, we analyzed and compared physiological and transcriptional responses of k. marxianus and s. cerevisiae to different oxygen- and anaerobic- growth factor limitation regimes in chemostat cultures. based on the outcome of this comparative analysis, subsequent experiments focused on characterization and engineering of sterol metabolism and yielded k. marxianus strains that grew anaerobically at °c. results k. marxianus and s. cerevisiae show different physiological responses to extreme oxygen limitation to investigate oxygen requirements of k. marxianus, physiological responses of strain cbs were studied in glucose-grown chemostat cultures operated at a dilution rate of . h- and subjected to different oxygenation and agf limitation regimes (fig. a). physiological parameters of k. marxianus in these cultures were compared to those of s. cerevisiae cen.pk - d subjected to the same cultivation regimes. in glucose-limited, aerobic chemostat cultures (supplied with . l air·min- , corresponding to mmol o h- ), the crabtree-negative yeast k. marxianus and the crabtree-positive yeast s. cerevisiae both exhibited a fully respiratory dissimilation of glucose, as evident from absence of ethanol production and a respiratory quotient (rq) close to (table ). apparent biomass yields on glucose of both yeasts exceeded . g biomass (g glucose)- and were approximately % higher than previously reported due to co-consumption of ethanol, which was used as solvent for the anaerobic growth factor ergosterol , . at a reduced oxygen-supply rate of . mmol o h- , both yeasts exhibited a mixed respiro-fermentative glucose metabolism. rq values close to and biomass-specific ethanol-production rates of . ± . mmol·g·h- for k. marxianus and . ± . mmol·g·h- for s. cerevisiae (table ), indicated that glucose .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / dissimilation in these cultures was predominantly fermentative. biomass-specific rates of glycerol production which, under oxygen-limited conditions, enables re-oxidation of nadh generated in biosynthetic reactions , were approximately . -fold higher (p = . · - ) in k. marxianus than in s. cerevisiae. glycerol production showed that the reduced oxygen-supply rate constrained mitochondrial respiration. however, low residual glucose concentrations (table ) indicated that sufficient oxygen was provided to meet most or all of the biosynthetic oxygen requirements of k. marxianus. to explore growth of k. marxianus under an even more stringent oxygen-limitation, we exploited previously documented challenges in achieving complete anaerobiosis in laboratory bioreactors , . even in chemostats sparged with pure nitrogen, s. cerevisiae grew on synthetic medium lacking tween and ergosterol, albeit at an increased residual glucose concentration (fig. , table ). in contrast, k. marxianus cultures sparged with pure n and supplemented with both agfs consumed only % of the glucose fed to the cultures. these severely oxygen-limited cultures showed a residual glucose concentration of . ± . g·l- and a low but constant biomass concentration of . ± . g·l- . this pronounced response of k. marxianus to extreme oxygen-limitation provided an experimental context for further analyzing its unknown oxygen requirements. s. cerevisiae can import exogenous sterols under severely oxygen-limited or anaerobic conditions . if the latter were also true for k. marxianus, omission of ergosterol from the growth medium of severely oxygen-limited cultures would increase biomass-specific oxygen requirements and lead to an even lower biomass concentration. in practice however, omission of ergosterol led to a small increase of the biomass concentration and a corresponding decrease of the residual glucose concentration in severely oxygen-limited chemostat cultures (fig. b, table ). this observation suggested that, in contrast to s. cerevisiae, k. marxianus cannot replace de novo oxygen-dependent sterol synthesis by uptake of exogenous sterols. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. | chemostat cultivation of s. cerevisiae cen.pk - d and k. marxianus cbs under different aeration and anaerobic-growth-factor (agf) supplementation regimes. the ingoing gas flow of all cultures was ml·min- , with oxygen partial pressures of · ppm (o · ), ppm (o ), or < . ppm (o . ). the agfs ergosterol (e) and/or tween (t) were added to media as indicated. a, schematic representation of experimental set-up. data for each cultivation regime were obtained from independent replicate chemostat cultures. b, residual glucose concentrations and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / biomass-specific oxygen consumption rates (qo ) under different aeration and agf-supplementation regimes. data represent mean and standard deviation of independent replicate chemostat cultures. c, distribution of consumed glucose over biomass and products in chemostat cultures of s. cerevisiae (left column) and k. marxianus (right column), normalized to a glucose uptake rate of . mol·h- . numbers in boxes indicate averages of measured metabolite formation rates (mol·h- ) and biomass production rates (g dry weight·h- ) for each aeration and agf supplementation regime. table | physiology of s. cerevisiae cen.pk - d and k. marxianus cbs in glucose-grown chemostat cultures with different aeration and anaerobic-growth-factor (agf) supplementation regimes. cultures were grown at ph . on synthetic medium with urea as nitrogen source and . g·l- glucose (aerobic cultures) or g·l- glucose (oxygen-limited cultures) as carbon and energy source. data are represented as mean ± se of data from independent chemostat cultures for each condition. the agfs ergosterol (e) and tween (t) were added to the media as indicated. cultures were aerated at ml·min- with gas mixtures containing · ppm o (o · ), ppm o (o ) or < . ppm o (o . ). tween was omitted from media used for aerobic cultivation to prevent excessive foaming. ethanol measurements were corrected for evaporation (supplementary fig. ). positive and negative biomass-specific conversion rates (q) represent consumption and production rates, respectively. s. cerevisiae cen.pk - d k. marxianus cbs condition aeration regime o · o o . o . o . o · o o . o . agf e te te t - e te te t replicates d (h- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . biomass (g·l- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . residual glucose (g·l- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . y biomass/glucose (g·g- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / y ethanol/glucose (g·g- ) - . ± . . ± . . ± . . ± . - . ± . . ± . . ± . qglucose (mmol·g·h- ) - . ± . - . ± . - . ± . - . ± . - . ± . - . ± . - . ± . - . ± . - . ± . qethanol (mmol·g·h- ) - . ± . . ± . . ± . . ± . . ± . - . ± . . ± . . ± . . ± . rq . ± . . ± . - - - . ± . . ± . - - glycerol/biomass (mmol·(g biomass)- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . carbon recovery (%) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . degree of reduction recovery (%) . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . . ± . transcriptional responses of k. marxianus to oxygen limitation involve ergosterol metabolism to further investigate the non-dissimilatory oxygen requirements of k. marxianus, transcriptome analyses were performed on cultures of s. cerevisiae and k. marxianus grown under the aeration and anaerobic-growth-factor supplementation regimes discussed above. the genome sequence of k. marxianus cbs was only available as draft assembly and was not annotated . therefore, long-read genome sequencing, assembly and de novo genome annotation were performed, the annotation was refined by using transcriptome assemblies (data availability). comparative transcriptome analysis of s. cerevisiae and k. marxianus focused on orthologous genes with divergent expression patterns that revealed a strikingly different transcriptional response to growth limitation by oxygen and/or anaerobic- growth-factor availability (fig. ). in s. cerevisiae, import of exogenous sterols by aus and pdr can alleviate the impact of oxygen limitation on sterol biosynthesis . consistent with this role of sterol uptake, sterol biosynthetic genes in s. cerevisiae were only highly upregulated in severely oxygen-limited cultures when ergosterol was omitted from the growth medium (fig. b, supplementary fig. , contrast ). also the mevalonate pathway for synthesis of the sterol precursor squalene, which does not require oxygen, was upregulated .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (contrast ), reflecting a relief of feedback regulation by ergosterol . in contrast, k. marxianus showed a pronounced upregulation of genes involved in sterol, isoprenoid and fatty-acid metabolism (fig. ab, fig. , contrast ) in severely oxygen-limited cultures supplemented with ergosterol and tween . no further increase of the expression levels of sterol biosynthetic genes was observed upon omission of these anaerobic growth factors from the medium of these cultures (supplementary fig. , contrast ). these observations suggested that k. marxianus may be unable to import ergosterol when sterol synthesis is compromised. consistent with this hypothesis, co-orthology prediction with proteinortho revealed no orthologs of the s. cerevisiae sterol transporters aus and pdr in k. marxianus. k. marxianus harbors two dihydroorotate dehydrogenases, a cytosolic fumarate-dependent enzyme (kmura ) and a mitochondrial quinone-dependent enzyme (kmura ). in vivo activity of the latter requires oxygen because the reduced quinone is reoxidized by the mitochondrial respiratory chain . consistent with these different oxygen requirements, kmura was down-regulated under severely oxygen-limited conditions, while kmura was upregulated (fig. b, contrast ). upregulation of kmura coincided with increased production of succinate (table ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. | transcriptional response of k. marxianus and s. cerevisiae to oxygen limitation and sterol, tween supplementation. transcriptome analyses were performed for each cultivation regime ( to ) of s. cerevisiae cen.pk - d (scer) and k. marxianus cbs (kmar). data for each regime were obtained from independent replicate chemostat cultures (fig. ). a, comparison of go-term gene-set enrichment analysis of biological processes in contrast of s. cerevisiae and k. marxianus with short description of go-terms (supplementary fig. - ). go-terms were vertically ordered based on their distinct directionality calculated with piano with go-terms enriched solely with up-regulated genes (blue) at the top, go-terms with mixed- or no-directionality in the middle (white) and go-terms with solely down-regulated genes at the bottom (brown). b, c, d, subsets of differentially expressed orthologous genes obtained from the gene-set analyses for both yeasts in contrasts and , and with genes without orthologs depicted with logfc value of in the respective yeast. b, s. cerevisiae genes previously shown as consistently upregulated under anaerobic conditions in four different nutrient- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / limitations . c, as described for panel b but for downregulated genes. d, differentially expressed genes uniquely found in this study. e, f, g, h, highlighted gene-sets showing divergent expression patterns across the two yeasts. e, s. cerevisiae genes upregulated in contrast but downregulated in k. marxianus. f, s. cerevisiae genes downregulated in contrast but upregulated in k. marxianus. g, h, similar to e and f but for contrast . fig. | different transcriptional regulation of ergosterol-biosynthesis in k. marxianus and s. cerevisiae. a, rnaseq was performed on independent replicate chemostat cultures of s. cerevisiae cen.pk - d and k. marxianus cbs for each aeration and anaerobic-growth-factor supplementation regime ( to ; fig. ). b, transcriptional differences in the mevalonate- and ergosterol-pathway genes of s. cerevisiae and k. marxianus for contrasts (o te |o · e), .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (o . te | o · e), (o . te | o te), (o . t | o . te), (o . | o . t). lumped biochemical reactions are represented by arrows. colors indicate up- (blue) or down-regulation (brown) with color intensity indicating the log fold change with color range capped to a maximum of . reactions are annotated with corresponding gene, k. marxianus genes are indicated with the name of the s. cerevisiae orthologs. ergosterol uptake by s. cerevisiae requires additional factors beyond the membrane transporters aus and pdr . no orthologs of the sterol-transporters or hmg were identified for k. marxianus and low read counts for erg , erg and erg precluded differential gene expression analysis across all conditions (dark grey). enzyme abbreviations: erg acetyl-coa acetyltransferase, erg -hydroxy- -methylglutaryl-coa (hmg-coa) synthase, hmg /hmg hmg-coa reductase, erg mevalonate kinase, erg phosphomevalonate kinase, mvd mevalonate pyrophosphate decarboxylase, idi isopentenyl diphosphate:dimethylallyl diphosphate (ipp) isomerase, erg farnesyl pyrophosphate synthetase, erg farnesyl-diphosphate transferase (squalene synthase), erg lanosterol synthase, erg lanosterol α-demethylase, cyb cytochrome b (electron donor for sterol c - desaturation), ncp nadp-cytochrome p reductase, erg c- sterol reductase, erg c- methyl sterol oxidase, erg c- sterol dehydrogenase, erg -keto-sterol reductase, erg endoplasmic reticulum membrane protein (may facilitate protein-protein interactions between erg and erg , or tether these to the er), erg Δ -sterol c-methyltransferase, erg Δ -sterol c- methyltransferase, erg c- sterol desaturase, erg c- sterol desaturase, erg c / sterol reductase, aus /pdr plasma-membrane sterol transporter. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / absence of sterol import in k. marxianus to test the hypothesis that k. marxianus lacks a functional sterol-uptake mechanism, uptake of fluorescent sterol derivative -nbd-cholesterol (nbdc) was measured by flow cytometry . since s. cerevisiae sterol transporters are not expressed in aerobic conditions and to avoid interference of sterol synthesis, nbdc uptake was analysed in anaerobic cell suspensions (fig. a). four hours after nbdc addition to cell suspensions of the reference strain s. cerevisiae imx , median single-cell fluorescence increased by -fold (fig. bc). in contrast, the congenic sterol-transporter-deficient strain imk (aus Δ pdr Δ) only showed a -fold increase of fluorescence, probably reflected detergent- resistant binding of nbdc to s. cerevisiae cell-wall proteins , . k. marxianus strains cbs and nbrc did not show increased fluorescence, neither after h nor after h of incubation with nbdc (< -fold, fig. bc, supplementary fig. ). fig. | uptake of the fluorescent sterol derivative nbdc by s. cerevisiae and k. marxianus strains. a, experimental approach. s. cerevisiae strains imx (reference) and imk (aus Δ pdr Δ), and k. marxianus strains nbrc and cbs were each anaerobically incubated in four replicate shake- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / flask cultures. nbdc and tween (nbdc t) were added to two cultures, while only tween (t) was added to the other two. after h incubation, cells were stained with propidium iodide (pi) and analysed by flow cytometry. pi staining was used to eliminate cells with compromised membrane integrity from analysis of nbdc fluorescence. cultivation conditions and flow cytometry gating are described in methods and in supplementary fig. , supplementary data set and . b, median and pooled standard deviation of fluorescence intensity (λex nm | λem / nm, fl -a) of pi-negative cells with variance of biological replicates after h exposure to tween (white bars) or tween and nbdc (blue bars). variance was pooled for the strains imx , cbs and nbrc by repeating the experiment. c, nbdc fluorescence-intensity distribution of cells in a sample from a single culture for each strain, shown as modal-scaled density function. dashed lines represent background fluorescence of unstained cells of s. cerevisiae and k. marxianus. fluorescence data for -h incubations with nbdc are shown in supplementary fig. . engineering k. marxianus for oxygen-independent growth sterol uptake by s. cerevisiae, which requires cell wall proteins as well as a membrane transporter, has not yet been fully resolved , . instead of expressing a heterologous sterol-import system in k. marxianus, we therefore explored production of tetrahymanol, which acts as a sterol surrogate in strictly anaerobic fungi . expression of a squalene-tetrahymanol cyclase from tetrahymena thermophila (ttstc ), which catalyzes the single-step oxygen-independent conversion of squalene into tetrahymanol (fig. a), was recently shown to enable sterol-independent growth of s. cerevisiae . ttstc was expressed in k. marxianus nbrc , which is more genetically amenable than strain cbs . after h of anaerobic incubation, the resulting strain contained . ± . mg·(g biomass)- tetrahymanol, . ± . mg·g- ergosterol and no detectable squalene, while strain nbrc contained . ± . mg·g- squalene and . ± . mg·g- ergosterol (fig. b). in strictly anaerobic cultures on sterol- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / free medium, strain nbrc grew immediately after inoculation but not after transfer to a second anaerobic culture (fig. c), consistent with ‘carry-over’ of ergosterol from the aerobic preculture . the tetrahymanol-producing strain did not grow under these conditions (fig. c) but showed sustained growth under severely oxygen-limited conditions that did not support growth of strain nbrc (fig. de). single-cell isolates derived from these oxygen-limited cultures (ims , ims , ims , ims ) showed instantaneous as well as sustained growth under strictly anaerobic conditions (figure f and g). tetrahymanol contents in the first, second and third cycle of anaerobic cultivation of isolate ims were . ± . mg·g- , . ± . mg·g- and . ± . mg·g- , respectively (fig. b), while no ergosterol was detected. to identify whether adaptation of the tetrahymanol-producing strain imx to anaerobic growth involved genetic changes, its genome and those of the four adapted isolates were sequenced (supplementary table ). no copy number variations were detected in any of the four adapted isolates. only strain ims showed two non-conservative mutations in coding regions: a single-nucleotide insertion in a transposon-borne gene and a stop codon at position (of bp) in kmcln , which encodes for a g cyclin . the apparent absence of mutations in the three other, independently adapted strains indicated that their ability to grow anaerobically reflected a non-genetic adaptation. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. | sterol-independent anaerobic growth of k. marxianus strains expressing ttstc . a, oxygen- dependent sterol synthesis and cyclisation of squalene to tetrahymanol by ttstc . b, squalene, ergosterol, and tetrahymanol contents with mean and standard error of the mean of (left panel) s. cerevisiae strains imx (reference), imx (sga Δ::ttstc ), and k. marxianus strains nbrc (reference), imx (ttstc ). lipid composition of single-cell isolate ims (ttstc ) (right panel) over serial transfers (c -c ). data from replicate cultures grown in strictly anaerobic (c, f, g) or severely oxygen-limited shake-flask cultures (d, e). aerobic grown pre-cultures were used to inoculate the first anaerobic culture on smg-urea and tween , when the optical density started to stabilize the cultures were transferred to new media. data depicted are of each replicate culture (points) and the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / mean (dotted line) from independent biological duplicate cultures, serial transfers cultures are represented with c -c . strains nbrc (wild-type, upward red triangles), imx (ttstc , cyan downward triangle), and the single-cell isolates ims (ttstc , orange circles), ims (ttstc , blue circles), ims (ttstc , yellow circles), ims (ttstc , purple circles). s. cerevisiae imx (reference, purple circle) and imx (ttstc , orange circles). c, extended data with double inoculum size is available in supplementary fig. . d, extended data is available in supplementary fig. a. test of anaerobic thermotolerance and selection for fast growing anaerobes one of the attractive phenotypes of k. marxianus for industrial application is its high thermotolerance with reported maximum growth temperatures of - °c , . to test if anaerobically growing tetrahymanol-producing strains retained thermotolerance, strain ims was grown in anaerobic sequential-batch-reactor (sbr) cultures (fig. ) in which, after an initial growth cycle at °c, the growth temperature was shifted to °c. specific growth at °c progressively accelerated from . h- to . h- over sbr cycles (corresponding to ca. generations; fig. b). a subsequent temperature increase to °c led to a strong decrease of the specific growth rate which, after approximately generations of selective growth, stabilized at approximately . h- . whole-population genome sequencing of the evolved populations revealed no common mutations or chromosomal copy number variations (supplementary table ). these data show that ttstc -expressing k. marxianus can grow anaerobically at temperatures up to at least °c. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. | thermotolerance and anaerobic growth of tetrahymanol-producing k. marxianus strain. the strain ims was grown in triplicate sequential batch bioreactor cultivations in synthetic media supplemented with g·l- glucose and mg·l- tween at ph . . a, experimental design of sequential batch fermentation with cycles at step-wise increasing temperatures to select for faster growing mutants, each cycle consisted of three phases; (i) (re)filling of the bioreactor with fresh media up to ml and adjustment of temperature to a new set-point, (ii) anaerobic batch fermentation at a fixed culture temperature with continuous n sparging for monitoring of co in the culture off-gas, and (iii) fast broth withdrawal leaving ml ( . fold dilution) to inoculate the next batch. b, maximum specific estimated growth rate (circles) of each batch cycle for the three independent bioreactor cultivations (m r blue, m r orange, m l grey) with the estimated number of generations. the growth rate was calculated from the co production as measured in the off-gas and should be interpreted as an estimate and in some cases could not be calculated. the culture temperature profile (dotted line) for .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / each independent bioreactor cultivation (blue, grey, orange) consisted of a step-wise increment of the temperature at the onset of the fermentation phase in each batch cycle. c, representative section of co off-gas profiles of the individual bioreactor (m r) cultivation over time with co fraction (orange line) and culture temperature (grey dotted line), data of the entire experiment is available in supplementary fig. (data availability). discussion industrial production of ethanol from carbohydrates relies on s. cerevisiae, due to its capacity for efficient, fast alcoholic fermentation and growth under strictly anaerobic process conditions. many facultatively fermentative yeast species outside the saccharomycotina wgd-clade also rapidly ferment sugars to ethanol under oxygen-limited conditions , but cannot grow and ferment in the complete absence of oxygen , , . identifying and eliminating oxygen requirements of these yeasts is essential to unlock their industrially relevant traits for application. here, this challenge was addressed for the thermotolerant yeast k. marxianus, using a systematic approach based on chemostat-based quantitative physiology, genome and transcriptome analysis, sterol-uptake assays and genetic modification. s. cerevisiae, which was used as a reference in this study, shows strongly different genome-wide expression profiles under aerobic and anaerobic or oxygen-limited conditions . although only a small fraction of these differences were conserved in k. marxianus (fig. ), we were able to identify absence of a functional sterol import system as the critical cause for its inability to grow anaerobically. enabling synthesis of the sterol surrogate tetrahymanol yielded strains that grew anaerobically at temperatures above the permissive temperature range of s. cerevisiae. a short adaptation phase of tetrahymanol-producing k. marxianus strains under oxygen-limited conditions reproducibly enabled strictly anaerobic growth. although this ability was retained after aerobic isolation of single-cell lines, we were unable to attribute this adaptation to mutations. in .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / contrast to wild-type k. marxianus, a non-adapted tetrahymanol-producing strain did not show ‘carry- over growth’ after transfer from aerobic to strictly anaerobic conditions and adapted cultures showed reduced squalene contents (fig. ). these observations suggest that interactions between tetrahymanol, ergosterol and/or squalene influence the onset of anaerobic growth and that oxygen-limited growth results in a stable balance between these lipids that is permissive for anaerobic growth. comparative genomic studies in saccharomycotina yeasts have previously led to the hypothesis that sterol transporters are absent from pre-wgd yeast species , . while our observations on k. marxianus reinforce this hypothesis, which was hitherto not experimentally tested, they do not exclude involvement of additional oxygen-requiring reactions in other non-saccharomyces yeasts. for example, pyrimidine biosynthesis is often cited as a key oxygen-requiring process in non-saccharomyces yeasts, due to involvement of a respiratory-chain-linked dihydroorotate dehydrogenase (dhod) , . k. marxianus, is among a small number of yeast species that, in addition to this respiration dependent enzyme (kmura ), also harbors a fumarate-dependent dhod (kmura ) . in k. marxianus the activation of this oxygen-independent kmura is a crucial adaptation for anaerobic pyrimidine biosynthesis. the experimental approach followed in the present study should be applicable to resolve the role of pyrimidine biosynthesis and other oxygen-requiring reactions in additional yeast species. enabling k. marxianus to grow anaerobically represents an important step towards application of this thermotolerant yeast in large-scale anaerobic bioprocesses. however, specific growth rates and biomass yields of tetrahymanol-expressing k. marxianus in anaerobic cultures were lower than those of wild-type s. cerevisiae strains. a similar phenotype of tetrahymanol-producing s. cerevisiae was proposed to reflect an increased membrane permeability . additional membrane engineering or expression of a functional sterol transport system is therefore required for further development of robust, anaerobically growing industrial strains of k. marxianus . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / online methods yeast strains, maintenance and shake-flask cultivation saccharomyces cerevisiae cen.pk - d , (mata mal - c suc ) was obtained from dr. peter kötter, j.w. goethe university, frankfurt. kluyveromyces marxianus strains cbs (atcc ; ncyc ; nrrl y- ) and nbrc (ifo ) were obtained from the westerdijk fungal biodiversity institute (utrecht, the netherlands) and the biological resource center, nite (nbrc) (chiba, japan), respectively. stock cultures of s. cerevisiae were grown at °c in an orbital shaker set at rpm, in ml shake flasks containing ml ypd ( g·l- bacto yeast extract, g·l- bacto peptone, g·l- glucose). for cultures of k. marxianus, the glucose concentration was reduced to . g·l- . after addition of glycerol to early stationary-phase cultures, to a concentration of % (v/v), ml aliquots were stored at - °c. shake-flask precultures for bioreactor experiments were grown in ml synthetic medium (sm) with glucose as carbon source and urea as nitrogen source (smg-urea) , . for anaerobic cultivation, synthetic medium was supplemented with ergosterol ( mg·l- ) and tween ( mg·l- ) as described previously , , . expression cassette and plasmid construction plasmids used in this study are described in (table ). to construct plasmids pude (grnaaus ) and pude (grnapdr ), the pros plasmid-backbone was pcr amplified using phusion hf polymerase (thermo scientific, waltham, ma) with the double-binding primer . pcr amplifications were performed with desalted or page-purified oligonucleotide primers (sigma-aldrich, st louis, mo) according to manufacturer’s instructions. to introduce the grna-encoding nucleotide sequences into grna-expression plasmids, a μm fragment was first amplified with primers and containing the specific sequence as primer overhang using pros as template. pcr products were purified with genelutepcr clean-up kit (sigma-aldrich) or gel dna recovery kit (zymo research, irvine, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ca). the two dna fragments were then assembled by gibson assembly (new england biolabs, ipswich, ma) according to the manufacturer’s instructions. gibson assembly reaction volumes were downscaled to µl and . pmol·µl- dna fragments at : molar ratio for h at °c. chemically competent e. coli xl -blue was transformed with the gibson assembly mix via a min incubation on ice followed by a s heat shock at °c and h recovery in non-selective lb medium. transformants were selected on lb agar containing the appropriate antibiotic. golden gate assembly with the yeast tool kit was performed in µl reaction mixtures containing . µl bsai hf v (neb, #r ), µl dna ligase buffer with atp (new england biolabs), . µl t -ligase (neb) with fmol dna donor fragments and milliq water. before ligation at °c was initiated by addition of t dna ligase, an initial bsai digestion ( min at °c) was performed. then cycles of digestion and ligation at °c and °c, respectively, were performed, with min incubation times for each reaction. thermocycling was terminated with a min final digestion step at °c. to construct a ttstc expression vector, the coding sequence of ttstc (pud ) was pcr amplified with primer pair / and golden gate assembled with the donor plasmids pggkd (ori ampr), pp (kmpdc p), pytk (scadh t) resulting in pude (ori ampr kmpdc p-ttstc - scadh t). for integration of ttstc cassette into the lac locus both upstream and downstream flanks ( / bps) of the lac locus were pcr amplified with the primer pairs / and / , respectively. an empty integration vector, pggkd , was constructed by bsai golden gate cloning of pytk (gfp-dropout), pytk (hygb), pytk (kanr), pytk (conre’), pytk (conls’) together with the two lac homologous nucleotide sequences. plasmid assembly was verified by pcr amplification with primers , , and and by digestion with bsmbi (new england biolabs, #r ). the integration vector pudi with the ttstc expression cassette was constructed by gibson assembly of the pcr amplified pggkd and pude with primer pairs / and / , thereby adding bp overlaps for assembly. for this step, the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / incubation time of the gibson assembly was increased to min. plasmid assembly was verified by diagnostic pcr amplification using dreamtaq polymerase (thermo scientific) with primers , , and subsequent illumina short-read sequencing. table | strains used in this study. abbreviations: saccharomyces cerevisiae (sc), kluyveromyces marxianus (km), tetrahymena thermophila (tt). genus strain relevant genotype reference s. cerevisiae cen.pk - d mata ura his leu trp mal - c suc entian and kötter, s. cerevisiae imx cen.pk - d can Δ::cas -natnt mans et al., s. cerevisiae imx imx sga Δ::ttstc wiersma et al., s. cerevisiae imk imx aus Δ this study s. cerevisiae imk imx pdr Δ this study s. cerevisiae imk imx aus Δ pdr Δ this study k. marxianus cbs ura his leu trp cbs-knaw* k. marxianus nbrc ura his leu trp nbrc** k. marxianus imx kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study k. marxianus ims kmpdc p-ttstc -scadh t-hygb this study .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table | crispr grna target sequences used in this study. grna target sequences are shown with pam sequences underlined. position in orf indicates the base pair after which the cas -mediated double-strand break is introduced. at score indicates the at content of the -bp target sequence and rna score indicates the fraction of unpaired nucleotides of the -bp target sequence, predicted with the complete grna sequence using a minimum free energy prediction by the rnafold algorithm . locus target sequence ( '- ') position in orf (bp) at score rna score aus cattattgtaaatgatttggtgg / . pdr atctttcatataaataacatagg / . table | plasmids used in this study. restriction enzyme recognition sites are indicated in superscript. us/ds represent upstream and downstream homologous recombination sequences used for genomic integration into the k. marxianus lac locus. abbreviations: saccharomyces cerevisiae (sc), kluyveromyces marxianus (km), tetrahymena thermophila (tt). plasmid characteristics source pggkd ori ampr conls gfp conr hassing et al., pggkd ori kanr notikmlac us bsmbiconre’bsaisfgfpbsai conls’bsmbi hygb kmlac dsnoti this study pp ori camr kmpdc p rajkumar et al., pros ori ampr μm amdsym psnr -grnacan prsnr -grnaade mans et al., pud ori kanr ttstc wiersma et al., pude ori ampr μm amdsym psnr -grnaaus prsnr -grnaaus this study pude ori ampr μm amdsym psnr -grnapdr prsnr -grnapdr this study pude ori ampr kmpdc p-ttstc -scadh t this study pudi ori kanr notikmlac us kmpdc p-ttstc -scadh t hygb kmlac dsnoti this study pytk ori camr conls’ lee et al., pytk ori camr gfp dropout lee et al., pytk ori camr scadh t lee et al., pytk ori camr conre' lee et al., pytk ori camr hygb lee et al., .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table | oligonucleotide primers used in this study. primer sequence ( '-> ') tgcgcatgtttcggcgttcgaaacttctccgcagtgaaagataaatgatccattattgtaaatgatttgggtttta gagctagaaatagcaagttaaaataag tgcgcatgtttcggcgttcgaaacttctccgcagtgaaagataaatgatcatctttcatataaataacatgtttta gagctagaaatagcaagttaaaataag tagtaaagactgctgtaattcatctctcagtccttgcagtctgctttttctggaattaattaccatttttaaatat atttctactttctacttaatagcaattttaattaatctaattat ataattagattaattaaaattgctattaagtagaaagtagaaatatatttaaaaatggtaattaattccagaaaaa gcagactgcaaggactgagagatgaattacagcagtctttacta tagcaaaaaaattcacaactaaacacgatagagtaaaattagagaagcaacgcctcgcggtcagtgaatagcgttc cgttagaaaacattcaaaattacctaatactattcaacagttct agaactgttgaatagtattaggtaattttgaatgttttctaacggaacgctattcactgaccgcgaggcgttgctt ctctaattttactctatcgtgtttagttgtgaatttttttgcta tgtcactacagccacagcag ttggtaaggcgccacactag agagaagcgccacatagacg tgcatatgctacgggtgacg cacccaagtatggtgggtag aagcatcgtctcatcggtctcatatgtcaatttcaaagtacttcactcccgttgctgac ttatgccgtctcaggtctcaggatttagttctgtacaggcttcttc ttatgccgtctcaggtctcaagaattagttctgtacaggcttcttc aagcatcgtctcatcggtctcatatgtctttcactaaaatcgctgccttattag ttatgccgtctcaggtctcaggatatcataagagcatagcagcggcaccggcaatag aagcatcgtctcatcggtctcacaatgaaagtgattgaagaaccctcaaac ttatgccgtctcaggtctcaagggttaagcaattggatcctacc aagcatcgtctcatcggtctcagagttgcttaattagcttgtacatggctttg ttatgccgtctcaggtctcatcgggaaggcccatattgaagacg cccaaatcatttacaataatggatcatttatc catgttatttatatgaaagatgatcatttatc gtccctaggttcgtcatt caagatcaatggtggctctc strain construction the lithium-acetate/polyethylene-glycol method was used for yeast transformation . homologous repair (hr) dna fragments for markerless crispr-cas -mediated gene deletions in s. cerevisiae were constructed by annealing two bp primers, using primer pairs / and / for deletion of pdr and aus , respectively. after transformation of s. cerevisiae imx with grna plasmids pude and pude and double-stranded repair fragments, transformants were selected on synthetic medium with acetamide as sole nitrogen source . deletion of aus and pdr was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / confirmed by pcr amplification with primer pairs / and / , respectively. loss of grna plasmids was induced by cultivation of single-colony isolates on ypd, after which plasmid loss was assessed by absence of growth of single-cell isolates on synthetic medium with acetamide as nitrogen source. an aus Δ pdr Δ double-deletion strain was similarly constructed by chemical transformation of s. cerevisiae imk with pude and repair dna. to integrate a ttstc expression cassette into the k. marxianus lac locus, k. marxianus nbrc was transformed with μg dna noti-digested pudi . after centrifugation, cells were resuspended in ypd and incubated at °c for h. cells were then again centrifuged, resuspended in demineralized water and plated on µg·l- hygromycin b (invivogen, toulouse, france) containing agar with µg·l- x-gal, -bromo- -chloro- -indolyl-β-d- galactopyranoside (fermentas, waltham, ma). colonies that could not convert x-gal were analyzed for correct genomic integration of the ttstc by diagnostic pcr with primers , and . genomic integration of ttstc into the chromosome outside the lac locus was confirmed by short-read illumina sequencing. chemostat cultivation chemostat cultures were grown at °c in l bioreactors (applikon, delft, the netherlands) with a stirrer speed of rpm. the dilution rate was set at . h- and a constant working volume of . l was maintained by connecting the effluent pump to a level sensor. cultures were grown on synthetic medium with vitamins . concentrated glucose solutions were autoclaved separately at °c for min and added at the concentrations indicated, along with sterile antifoam pluronic pe (basf, ludwigshafen, germany; final concentration . g·l- ). before autoclaving, bioreactors were tested for gas leakage by submerging them in water while applying a . bar overpressure. anaerobic conditions of bioreactor cultivations were maintained by continuous reactor headspace aeration with pure nitrogen gas (≤ . ppm o , hiq nitrogen . , linde ag, schiedam, the netherlands) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / at a flowrate of ml n min- ( . vvm). gas pressure of . bar of the reactor headspace was set with a reduction valve (tescom europe, hannover, germany) and remained constant during cultivation. to prevent oxygen diffusion into the cultivation the bioreactor was equipped with fluran tubing ( barrer o , f- -a, saint-gobain, courbevoie, france), viton o-rings (eriks, alkmaar, the netherlands), and no ph probes were mounted. the medium reservoir was deoxygenated by sparge aeration with nitrogen gas (≤ ppm o , hiq nitrogen . , linde ag). for aerobic cultivation the reactor was sparged continuously with dried air at a flowrate of ml air min- ( . vvm). dissolved oxygen levels were analyzed by clark electrodes (applisens, applikon) and remained above % during the cultivation. for micro-aerobic cultivations nitrogen (≤ ppm o , hiq nitrogen . , linde ag) and air were mixed continuously by controlling the fractions of mass flow rate of the dry gas to a total flow of ml min- per bioreactor. the mixed gas was distributed to each bioreactor and analyzed separately in real-time. continuous cultures were assumed to be in steady state when after at least volumes changes, culture dry weight and the specific carbon dioxide production rates changed by less than %. cell density was routinely measured at a wavelength of nm with spectrophotometer jenway (cole palmer, staffordshire, uk). cell dry weight of the cultures were determined by filtering exactly ml of culture broth over pre-dried and weighed membrane filters ( . µm, thermo fisher scientific), which were subsequently washed with demineralized water, dried in a microwave oven ( min, w) and weighed again . metabolite analysis for determination of substrate and extracellular metabolite concentrations, culture supernatants were obtained by centrifugation of culture samples ( min at rpm) and analyzed by high-performance liquid chromatography (hplc) on a waters alliance hplc (waters, ma, usa) equipped with a bio- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / rad hpx- h ion exchange column (biorad, veenendaal, the netherlands) operated at °c with a mobile phase of mm h so at a flowrate of . ml·min- . compounds were detected by means of a dual-wavelength absorbance detector (waters ) and a refractive index detector (waters ) and compared to reference compounds (sigma-aldrich). residual glucose concentrations in continuous cultivations were determined by hplc analysis from rapid quenched culture samples with cold steel beads . gas analysis the off-gas from bioreactor cultures was cooled with a condenser ( °c) and dried with permapure dryer (inacom instruments, veenendaal, the netherlands) prior to analysis of the carbon dioxide and oxygen fraction with a rosemount nga analyser (baar, switzerland). the rosemount gas analyzer was calibrated with defined mixtures of . % o , . % co and high quality nitrogen gas n (linde ag). ethanol evaporation rate to correct for ethanol evaporation in the continuous bioreactor cultivations the ethanol evaporation rate was determined in the same experimental bioreactor set-up without the yeast. to sm glucose media with urea mm of ethanol was added after which the decrease in the ethanol concentration was measured over time by periodic measurements and quantification by hplc analysis over the course of at least hours. to reflect the media composition used for the different oxygen regimes and anaerobic growth factor supplementation, the ethanol evaporation was measured for bioreactor sparge aeration with tween , bioreactor head-space aeration both with and without tween . the ethanol evaporation rate was measured for each condition in triplicate. lipid extractions & gc analysis .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / for analysis of triterpene and triterpenoid cell contents biomass was harvested, washed once with demineralized water and stored as pellet at - °c before freeze-drying the pellets using an alpha - ld plus (martin christ, osterode am harz, germany) at - °c and . mbar. freeze-dried biomass was saponificated with . m naoh (bio-ultra, sigma-aldrich) in methylation glass tubes (pyrextm boroslicate glass, thermo fisher scientific) at °c. as internal standard α-cholestane (sigma-aldrich) was added to the saponified biomass suspension. subsequently tert-butyl-methyl-ether (tbme, sigma- aldrich) was added for organic phase extraction. samples were extracted twice using tbme and dried with sodium-sulfate (merck, darmstadt, germany) to remove remaining traces of water. the organic phase was either concentrated by evaporation with n gas aeration or transferred directly to an injection vial (vwr international, amsterdam, the netherlands). the contents were measured by gc-fid using agilent a gas chromatograph (agilent technologies, santa clara, ca) equipped with an agilent cp column (agilent). the oven was programmed to start at °c for min, ramp first to °c with °c·min- and secondly to °c with a rate of °c·min- with a final temperature hold of min. spectra were compared to separate calibration lines of squalene, ergosterol, α-cholestane, cholesterol and tetrahymanol as described previously . sterol uptake assay sterol uptake was monitored by the uptake of fluorescently labelled -nbd-cholesterol (avanti polar lipids, alabaster, al). a stock solution of -nbd-cholesterol (nbdc) was prepared in ethanol under an argon atmosphere and stored at - °c. shake flasks with ml sm glucose media were inoculated with yeast strains from a cryo-stock and cultivated aerobically at rpm at °c overnight. the yeast cultures were subsequently diluted to an od of . in ml sm glucose media in ml shake flasks to gradually reduce the availability of oxygen and incubated overnight. yeast cultures were transferred to fresh sm media with g·l- glucose and incubated under anaerobic conditions at °c .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / at rpm. after hours of anaerobic incubation µg·l- nbd-cholesterol with mg·l- tween were pulsed to the cultures. samples were taken and washed with pbs ml·l- tergitol np- ph . (sigma-aldrich) twice before resuspension in pbs and subsequent analysis. propidium iodide (pi) (invitrogen) was added to the sample ( µm) and stained according to the manufacturer’s instructions . pi intercalates with dna in cells with a compromised cell membrane, which results in red fluorescence. samples both unstained and stained with pi were analyzed with accuri c flow cytometer (bd biosciences, franklin lakes, nj) with a nm laser and fluorescence was measured with emission filter of / nm (fl ) for nbd-cholesterol and > nm (fl ) for pi. cell gating and median fluorescence of cells were determined using flowjo (v , bd bioscience). cells were gated based on forward side scatter (fsc) and side-scatter (ssc) to exclude potential artifacts or clumping cells. within this gated population pi positive and negatively stained cells were differentiated based on the cell fluorescence across a fl fl dimension. flow cytometric gates were drafted for each yeast species and used for all samples. the gating strategy is given in supplementary fig. . fluorescence of a strain was determined by a sample of cells from independent shake-flask cultures and compared to cells from identical unstained cultures of cells with the exact same chronological age. the staining experiment of the strains imx , cbs and nbrc samples was repeated twice for reproducibility, the mean and pooled variance was subsequently calculated from the biological duplicates of the two experiments. the nbdc intensity and cell counts obtained from the nbdc experiments are available for re-analysis in supplementary data set , and raw flow cytometry plots are depicted in supplementary data set . long read sequencing, assembly, and annotation cells were grown overnight in -ml shake flasks containing ml liquid ypd medium at °c in an orbital shaker at rpm. after reaching stationary phase the cells were harvested for a total od of by centrifugation for min at g. genomic dna of cbs and nbrc was isolated using .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the qiagen genomic dna /g kit (qiagen, hilden, germany) according to the manufacturer’s instructions. minion genomic libraries were prepared using the d genomic dna by ligation (sqk- lsk ) for cbs , and the d native barcoding genomic dna (exp-nbd & lsk ) for nbrc according to the manufacturer’s instructions with the exception of using % etoh during the ‘end repair/da-tailing module’ step. flow cell quality was tested by running the minknow platform qc (oxford nanopore technology, oxford, uk). flow cells were prepared by removing μl buffer and subsequently primed with priming buffer. the dna library was loaded dropwise into the flow cell for sequencing. the sqk-lsk library was sequenced on a r chemistry flow cell (flo-min ) for h. base-calling was performed using albacore (v . . , oxford nanopore technologies) for cbs , and for nbrc with guppy (v . . , oxford nanopore technologies) using dna_r . . _ bps_flipflop.cfg. cbs reads were assembled using canu (v . ) , and nbrc reads were assembled using flye (v . . -b ) . assemblies were polished with pilon (v . ) using illumina data available at the sequence read archive under accessions srx and srx . both de novo genome assemblies were annotated using funannotate (v . . ) , trained and refined using de novo transcriptome assemblies (see below), adding functional annotation with interproscan (v . - . ) . illumina sequencing plasmids were sequenced on a miniseq (illumina, san diego, ca) platform. library preparation was performed with nextera xt dna library preparation according to the manufacturer’s instructions (illumina). the library preparation included the miniseq mid output kit ( cycles) and the input & final dna was quantified with the qubit hs dsdna kit (life technologies, thermo fisher scientific). nucleotide sequences were assembled with spades and compared to the intended in silico dna construct. for whole-genome sequencing, yeast cells were harvested from overnight cultures and dna was isolated with the qiagen genomic dna /g kit (qiagen) as described earlier. dna quantity was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / measured with the qubit br dsdna kit (thermo fisher scientific). bp paired-end libraries were prepared with the truseq dna pcr-free library prep kit (illumina) according to the manufacturer’s instructions. short read whole-genome sequencing was performed on a miseq platform (illumina). rna isolation, sequencing and transcriptome analysis culture broth from chemostat cultures was directly sampled into liquid nitrogen to prevent mrna turnover. the cell cultures were stored at - °c and processed within days after sampling. after thawing on ice, cells were harvested by centrifugation. total rna was extracted by a min heatshock at °c with a mix of isoamyl alcohol, phenol and chloroform at a ratio of : : , respectively (invitrogen). rna was extracted from the organic phase with tris-hcl and subsequently precipitated by the addition of m nac-acetate and % (v/v) ethanol at - °c. precipitated rna was washed with ethanol, collected and after drying resuspended in rnase free water. the quantity of total rna was determined with a qubit rna br assay kit (thermo fisher scientific). rna quality was determined by the rna integrity number with rna screen tape using a tapestation (agilent). rna libraries were prepared with the truseq stranded mrna lt protocol (illumina, # ) and subjected to paired-end sequencing ( bp read length, novaseq illumina) by macrogen (macrogen europe, amsterdam, the netherlands). pooled rnaseq libraries were used to perform de novo transcriptome assembly using trinity (v . . ) which was subsequently used as evidence for both cbs and nbrc genome annotations. rnaseq libraries were mapped into the cbs genome assembly described above, using bowtie (v . . . ) with parameters (-v -k --best -m ) to allow no mismatches, select the best out of possible alignments per read, and for reads having more than one possible alignment randomly report only one. alignments were filtered and sorted using samtools (v . . ) . read counts were obtained .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / with featurecounts (v . . ) using parameters (-b -c) to only count reads for which both pairs are aligned into the same chromosome. differential gene expression (dge) analysis was performed using edger (v . . ) . genes with read counts in all conditions were filtered out from the analysis, same as genes with less than counts per million. counts were normalized using the trimmed mean of m values (tmm) method , and dispersion was estimated using generalized linear models. differentially expressed genes were then calculated using a log ratio test adjusted with the benjamini-hochberg method. absolute log fold-change values > , false discovery rate < . , and p value < . were used as significance cutoffs. gene set analysis (gsa) based on gene ontology (go) terms was used to get a functional interpretation of the dge analysis. for this purpose, go terms were first obtained for the s. cerevisiae cen.pk - d (gca_ . ) and k. marxianus cbs genome annotations using funannotate and interproscan as described above. afterwards, funannotate compare was used to get (co)ortholog groups of genes generated with proteinortho using the following public genome annotations s. cerevisiae s c (gcf_ . ), k. marxianus nbrc (gca_ . ), k. marxianus dmku - (gcf_ . ), in addition to the new genome annotations generated here for s. cerevisiae cen.pk - d, and k. marxianus cbs and nbrc . predicted go terms for s. cerevisiae cen.pk - d and k. marxianus cbs were kept, and merged with those from corresponding (co)orthologs from s. cerevisiae s c. genes with term go: (ribosome) were not considered for further analyses. gsa was then performed with piano (v . . ) . gene set statistics were first calculated with the stouffer, wilcoxon rank-sum test, and reporter methods implemented in piano. afterwards, consensus results were derived by p-value and rank aggregation, considered significant if absolute fold change values > . complexheatmap (v . . ) was used to draw gsa results .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / into fig. , highlighting differentially expressed genes found in a previous study . dge and gsa were performed using r (v . . ) . anaerobic growth experiments anaerobic shake-flask experiments were performed in a bactron anaerobic workstation (bactron - , sheldon manufacturing, cornelius, or) at °c. the gas atmosphere consisted of % n , % co and % h and was maintained anaerobic by a pd catalyst. the catalyst was re-generated by heating till °c every week and interchanged by placing it in the airlock whenever the pass-box was used. -ml shake flasks were filled with ml ( % volumetric) media and placed on an orbital shaker (ks basic, ika, staufen, germany) set at rpm inside the anaerobic chamber. sterile growth media was placed inside the anaerobic chamber h prior to inoculation to ensure complete removal of traces of oxygen. the anaerobic growth ability of the yeast strains was tested on smg-urea with g·l- glucose at ph . with tween prepared as described earlier. the growth experiments were started from aerobic pre- cultures on smg-urea media and the anaerobic shake flasks were inoculated at an od of . (corresponding to an od of . ). in order to minimize opening the anaerobic chamber, culture growth was monitored by optical density measurements inside the chamber using an ultrospec cell density meter (biochrom, cambridge, uk) at a nm wavelength. when the optical density of culture no longer increased or decreased new shake-flask cultures were inoculated by serial transfer at an initial od of . . laboratory evolution in low oxygen atmosphere adaptive laboratory evolution for strict anaerobic growth was performed in a bactron anaerobic workstation (bactron bac-x- e, sheldon manufacturing) at °c. -ml shake flasks were filled with .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ml smg-urea with g·l- glucose and including mg·l- tween . subsequently the shake-flask media were inoculated with imx from glycerol cryo-stock at od < . and thereafter placed inside the anaerobic chamber. due to frequent opening of the pass-box and lack of catalyst inside the pass-box oxygen entry was more permissive. after the optical density of the cultures no longer increased, cultures were transferred to new media by - x serial dilution. for ims , ims , ims three and for ims , ims , ims four serial transfers in shake-flask media were performed after which single colony isolates were made by plating on ypd agar media with hygromycin antibiotic at °c aerobically. single colony isolates were subsequently restreaked sequentially for three times on the same media before the isolates were propagated in sm glucose media and glycerol cryo stocked. to determine if an oxygen-limited pre-culture was required for the strict anaerobic growth of imx strain a cross-validation experiment was performed. in parallel, yeast strains were cultivated in -ml shake-flask cultures with smg-urea with g·l- glucose at ph . with tween in both the bactron anaerobic workstation (bactron bac-x- e, sheldon manufacturing) with low levels of oxygen- contamination, and in the bactron anaerobic workstation (bactron - , sheldon manufacturing) with strict control of oxygen-contamination. after stagnation of growth was observed in the second serial transfer of the shake-flask cultures a . ml sample of each culture was taken, sealed, and used to inoculate fresh-media in the other bactron anaerobic workstation. simultaneously, the original culture was used to inoculate fresh media in the same bactron anaerobic workstation, thereby resulting in parallel cultures of each strain of which halve were derived from the other bactron anaerobic workstation. laboratory evolution in sequential batch reactors .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / laboratory evolution for selection of fast growth at high temperatures was performed in -ml multifors (infors benelux, velp, the netherlands) bioreactors with a working volume of ml for the strain ims on smg g·l- glucose media with tween in triplicate. anaerobic conditions were created and maintained by continuous aeration of the cultures with ml·min- ( . vvm) n gas and continuous aeration of the media vessels with n gas. the ph was set at . and maintained by the continuous addition of sterile m koh. growth was monitored by analysis of the co in the bioreactor off-gas and a new empty-refill cycle was initiated when the batch time had at least elapsed hours and the co signal dropped to % of the maximum reached in each batch. the dilution factor of each empty-refill cycle was . -fold ( ml working volume, ml residual volume). the first batch fermentation was performed at °c after which in the second batch the temperature was increased to °c and maintained at for consecutive sequential batches. after the batch cycle at °c the culture temperature was again increased to °c and maintained subsequently. growth rate was calculated based on the co production as measured by the co fraction in the culture off-gas in essence as described previously . in short, the co fraction in the off-gas was converted to a co evolution rate of mmol per hour and subsequently summed over time for each cycle. the corresponding cumulative co profile was transformed to natural log after which the stepwise slope of the log transformed data was calculated. subsequently an iterative exclusion of datapoints of the stepwise slope of the log transformed cumulative co profile was performed with exclusion criteria of more than one standard deviation below the mean. variant calling dna sequencing reads were aligned into the nbrc described above including an additional sequence with ttstc construct, and used to detect sequence variants using a method previously reported . briefly, reads were aligned using bwa (v . . -r -dirty) , alignments were processed .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / using samtools (v . . ) and picard tools (v . . -snapshot) (http://broadinstitute.github.io/picard), and variants were then called using the genome analysis toolkit (v . - - -gf c c ef) haplotypecaller in discovery and gvcf modes. variants were only called at sites with minimum variant confidence normalized by unfiltered depth of variant samples (qd) of , read depth (dp) ≥ , and genotype quality (gq) > , excluding a . kb region in chromosome containing rdna. variants were annotated using the genome annotation described above, including the ttstc construct, with snpeff (v . ) and vcfannotator (http://vcfannotator.sourceforge.net). statistics statistical test performed are given as two sided with unequal variance t-test unless specifically stated otherwise. we denote technical replicates as measurements derived from a single cell culture. biological replicates are measurements originating from independent cell cultures. independent experiments are two experiments identical in set-up separated by the difference in execution days. if possible variance from independent experiments with identical setup were pooled together, but independent experiments from time-course experiments (anaerobic growth studies) are reported separately. p- values were corrected for multiple-hypothesis testing which is specifically reported each time. no data was excluded based on the resulting data out-come. data availability data supporting the findings of this work are available within the paper and source data for all figures in this study are available at the www.data. tu.nl repository with the doi: . / . the raw rna-sequencing data that supports the findings of this study are available from the genome expression omnibus (geo) website (https://www.ncbi.nlm.nih.gov/geo/) with number gse . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://www.data. tu.nl/ https://www.ncbi.nlm.nih.gov/geo/ https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / whole-genome sequencing data of the cbs , nbrc and evolved strains were deposited at ncbi (https://www.ncbi.nlm.nih.gov/) under bioproject accession number prjna . code availability the code that were used to generate the results obtained in this study are archived in a gitlab repository (https://gitlab.tudelft.nl/rortizmerino/kmar_anaerobic). author’s contributions wd and jtp designed the study and wrote the manuscript. wd performed molecular cloning, bioreactor cultivation experiment, transcriptome analysis and sterol-uptake experiments. jb contributed to bioreactor cultivation experiments and molecular cloning. fw contributed to the molecular cloning and sterol-uptake experiments. ak and cm contributed to bioreactor experiments and transcriptome studies. pdlt performed plasmid and genome sequencing. ro contributed to transcriptome analysis and performed sequence annotation and assembly. acknowledgements we thank mark bisschops and hannes jürgens for fruitful discussions. we thank erik de hulster for fermentation support and marcel van den broek for input on the bioinformatics analyses. competing interest wd and jtp are co-inventors on a patent application that covers aspects of this work. the authors declare no conflict of interest. funding this work was supported by advanced grant (grant # ) of the european research council to jtp. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://gitlab.tudelft.nl/rortizmerino/kmar_anaerobic https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . annual world fuel ethanol production. renewable fuels association ( ). available at: https://ethanolrfa.org/statistics/annual-ethanol-production/. (accessed: nd may ) . jansen, m. l. a. et al. saccharomyces cerevisiae strains for second-generation ethanol production: from academic exploration to industrial implementation. fems yeast res. , – ( ). . weusthuis, r. a., lamot, i., van der oost, j. & sanders, j. p. m. microbial production of bulk chemicals: development of anaerobic processes. trends biotechnol. , – ( ). . favaro, l., jansen, t. & van zyl, w. h. exploring industrial and natural saccharomyces cerevisiae strains for the bio-based economy from biomass: the case of bioethanol. crit. rev. biotechnol. , – ( ). . stovicek, v., holkenbrink, c. & borodina, i. crispr/cas system for yeast genome engineering: advances and applications. fems yeast res. , – ( ). . hong, j., wang, y., kumagai, h. & tamaki, h. construction of thermotolerant yeast expressing thermostable cellulase genes. j. biotechnol. , – ( ). . laman trip, d. s. & youk, h. yeasts collectively extend the limits of habitable temperatures by secreting glutathione. nat. microbiol. , – ( ). . choudhary, j., singh, s. & nain, l. thermotolerant fermenting yeasts for simultaneous saccharification fermentation of lignocellulosic biomass. electron. j. biotechnol. , – ( ). . thorwall, s., schwartz, c., chartron, j. w. & wheeldon, i. stress-tolerant non-conventional microbes enable next-generation chemical biosynthesis. nat. chem. biol. , – ( ). . mejía-barajas, j. a. et al. second-generation bioethanol production through a simultaneous saccharification-fermentation process using kluyveromyces marxianus thermotolerant yeast. in special topics in renewable energy systems (intech, ). doi: . /intechopen. . snoek, i. s. i. & steensma, h. y. why does kluyveromyces lactis not grow under anaerobic conditions? comparison of essential anaerobic genes of saccharomyces cerevisiae with the kluyveromyces lactis genome. fems yeast res. , – ( ). . visser, w., scheffers, w. a., batenburg-van der vegte, w. h. & van dijken, j. p. oxygen requirements of yeasts. appl. environ. microbiol. , – ( ). . merico, a., sulo, p., piškur, j. & compagno, c. fermentative lifestyle in yeasts belonging to the saccharomyces complex. febs j. , – ( ). . andreasen, a. a. & stier, t. j. b. anaerobic nutrition of saccharomyces cerevisiae i. ergosterol requirement for growth in a defined medium. j. cell. physiol. , – ( ). . andreasen, a. a. & stier, t. j. b. anaerobic nutrition of saccharomyces cerevisiae ii. unsaturated fatty acid requirement for growth in a defined medium. j. cell. physiol. , – ( ). . passi, s. et al. saturated dicarboxylic acids as products of unsaturated fatty acid oxidation. biochim. biophys. acta - lipids lipid metab. , – ( ). . verduyn, c., postma, e., scheffers, w. a. & van dijken, j. p. physiology of saccharomyces cerevisiae in anaerobic glucose-limited chemostat cultures. j. gen. microbiol. , – .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ( ). . perli, t., wronska, a. k., ortiz-merino, r. a., pronk, j. t. & daran, j. m. vitamin requirements and biosynthesis in saccharomyces cerevisiae. yeast – ( ). doi: . /yea. . dekker, w. j. c., wiersma, s. j., bouwknegt, j., mooiman, c. & pronk, j. t. anaerobic growth of saccharomyces cerevisiae cen.pk - d does not depend on synthesis or supplementation of unsaturated fatty acids. fems yeast res. , ( ). . wilcox, l. j. et al. transcriptional profiling identifies two members of the atp-binding cassette transporter superfamily required for sterol uptake in yeast. j. biol. chem. , – ( ). . black, p. n. & dirusso, c. c. yeast acyl-coa synthetases at the crossroads of fatty acid metabolism and regulation. biochim. biophys. acta - mol. cell biol. lipids , – ( ). . jacquier, n. & schneiter, r. ypk , the yeast orthologue of the human serum- and glucocorticoid- induced kinase, is required for efficient uptake of fatty acids. j. cell sci. , – ( ). . blomqvist, j., nogue, v. s., gorwa-grauslund, m. & passoth, v. physiological requirements for growth and competitveness of dekkera bruxellensis under oxygen limited or anaerobic conditions. yeast , – ( ). . zavrel, m., hoot, s. j. & white, t. c. comparison of sterol import under aerobic and anaerobic conditions in three fungal species, candida albicans, candida glabrata, and saccharomyces cerevisiae. eukaryot. cell , – ( ). . visser, w., scheffers, w. a., batenburg-van der vegte, w. h. & van dijken, j. p. oxygen requirements of yeasts. appl. environ. microbiol. , – ( ). . dashko, s., zhou, n., compagno, c. & piškur, j. why, when, and how did yeast evolve alcoholic fermentation? fems yeast res. , – ( ). . snoek, i. s. i. & steensma, h. y. factors involved in anaerobic growth of saccharomyces cerevisiae. yeast , – ( ). . vale da costa, b. l., basso, t. o., raghavendran, v. & gombert, a. k. anaerobiosis revisited: growth of saccharomyces cerevisiae under extremely low oxygen availability. appl. microbiol. biotechnol. – ( ). doi: . /s - - - . wilkins, m. r., mueller, m., eichling, s. & banat, i. m. fermentation of xylose by the thermotolerant yeast strains kluyveromyces marxianus imb , imb , and imb under anaerobic conditions. process biochem. , – ( ). . hughes, s. r. et al. automated uv-c mutagenesis of kluyveromyces marxianus nrrl y- and selection for microaerophilic growth and ethanol production at elevated temperature on biomass sugars. j. lab. autom. , – ( ). . tetsuya, g. et al. bioethanol production from lignocellulosic biomass by a novel kluyveromyces marxianus strain. biosci. biotechnol. biochem. , – ( ). . van urk, h., postma, e., scheffers, w. a. & van dijken, j. p. glucose transport in crabtree-positive and crabtree-negative yeasts. j. gen. microbiol. , – ( ). . von meyenburg, k. katabolit-repression und der sprossungszyklus von saccharomyces cerevisiae. (eth zürich, ). doi: . /ethz-a- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . rouwenhorst, r. j., visser, l. e., van der baan, a. a., scheffers, w. a. & van dijken, j. p. production, distribution, and kinetic properties of inulinase in continuous cultures of kluyveromyces marxianus cbs . appl. environ. microbiol. , – ( ). . bakker, b. m. et al. stoichiometry and compartmentation of nadh metabolism in saccharomyces cerevisiae. fems microbiol. rev. , – ( ). . jeong, h. et al. genome sequence of the thermotolerant yeast kluyveromyces marxianus var. marxianus kctc . eukaryot. cell , – ( ). . jordá, t. & puig, s. regulation of ergosterol biosynthesis in saccharomyces cerevisiae. genes (basel). , ( ). . lechner, m. et al. proteinortho: detection of (co-)orthologs in large-scale analysis. bmc bioinformatics , ( ). . nagy, m., lacroute, f. & thomas, d. divergent evolution of pyrimidine biosynthesis between anaerobic and aerobic yeasts. proc. natl. acad. sci. u. s. a. , – ( ). . väremo, l., nielsen, j. & nookaew, i. enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods. nucleic acids res. , – ( ). . tai, s. l. et al. two-dimensional transcriptome analysis in chemostat cultures: combinatorial effects of oxygen availability and macronutrient limitation in saccharomyces cerevisiae. j. biol. chem. , – ( ). . alimardani, p. et al. sut -promoted sterol uptake involves the abc transporter aus and the mannoprotein dan whose synergistic action is sufficient for this process. biochem. j. , – ( ). . marek, m., silvestro, d., fredslund, m. d., andersen, t. g. & pomorski, t. g. serum albumin promotes atp-binding cassette transporter-dependent sterol uptake in yeast. fems yeast res. , – ( ). . marek, m. et al. the yeast plasma membrane atp binding cassette (abc) transporter aus : purification, characterization, and the effect of lipids on its activity. j. biol. chem. , – ( ). . takishita, k. et al. lateral transfer of tetrahymanol-synthesizing genes has allowed multiple diverse eukaryote lineages to independently adapt to environments without oxygen. biol. direct , ( ). . wiersma, s. j., mooiman, c., giera, m. & pronk, j. t. squalene-tetrahymanol cyclase expression enables sterol-independent growth of saccharomyces cerevisiae. appl. environ. microbiol. , – ( ). . rajkumar, a. s., varela, j. a., juergens, h., daran, j. g. & morrissey, j. p. biological parts for kluyveromyces marxianus synthetic biology. front. bioeng. biotechnol. , – ( ). . landry, b. d., doyle, j. p., toczyski, d. p. & benanti, j. a. f-box protein specificity for g cyclins is dictated by subcellular localization. plos genet. , e ( ). . fonseca, g. g., heinzle, e., wittmann, c. & gombert, a. k. the yeast kluyveromyces marxianus and its biotechnological potential. appl. microbiol. biotechnol. , – ( ). . madeira-jr, j. v. & gombert, a. k. towards high-temperature fuel ethanol production using .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / kluyveromyces marxianus: on the search for plug-in strains for the brazilian sugarcane-based biorefinery. biomass and bioenergy , – ( ). . tai, s. l. et al. two-dimensional transcriptome analysis in chemostat cultures: combinatorial effects of oxygen availability and macronutrient limitation in saccharomyces cerevisiae. j. biol. chem. , – ( ). . seret, m. l., diffels, j. f., goffeau, a. & baret, p. v. combined phylogeny and neighborhood analysis of the evolution of the abc transporters conferring multiple drug resistance in hemiascomycete yeasts. bmc genomics , ( ). . shi, n. q. & jeffries, t. w. anaerobic growth and improved fermentation of pichia stipitis bearing a ura gene from saccharomyces cerevisiae. appl. microbiol. biotechnol. , – ( ). . gojković, z. et al. horizontal gene transfer promoted evolution of the ability to propagate under anaerobic conditions in yeasts. mol. genet. genomics , – ( ). . riley, r. et al. comparative genomics of biotechnologically important yeasts. proc. natl. acad. sci. u. s. a. , – ( ). . guo, l., pang, z., gao, c., chen, x. & liu, l. engineering microbial cell morphology and membrane homeostasis toward industrial applications. curr. opin. biotechnol. , – ( ). . entian, k.-d. & kötter, p. yeast genetic strain and plasmid collections. in methods in microbiology – ( ). doi: . /s - ( ) - . nijkamp, j. f. et al. de novo sequencing, assembly and analysis of the genome of the laboratory strain saccharomyces cerevisiae cen.pk - d, a model for modern industrial biotechnology. microb. cell fact. , ( ). . bracher, j. m. et al. laboratory evolution of a biotin-requiring saccharomyces cerevisiae strain for full biotin prototrophy and identification of causal mutations. appl. environ. microbiol. , – ( ). . lee, m. e., deloache, w. c., cervantes, b. & dueber, j. e. a highly characterized yeast toolkit for modular, multipart assembly. acs synth. biol. , – ( ). . mans, r. et al. crispr/cas : a molecular swiss army knife for simultaneous introduction of multiple genetic modifications in saccharomyces cerevisiae. fems yeast res. , – ( ). . lorenz, r. et al. viennarna package . . algorithms mol. biol. , ( ). . hassing, e. j., de groot, p. a., marquenie, v. r., pronk, j. t. & daran, j. m. g. connecting central carbon and aromatic amino acid metabolisms to improve de novo -phenylethanol production in saccharomyces cerevisiae. metab. eng. , – ( ). . gietz, r. d. & woods, r. a. genetic transformation of yeast. biotechniques , – ( ). . solis-escalante, d. et al. amdsym, a new dominant recyclable marker cassette for saccharomyces cerevisiae. fems yeast res. , – ( ). . postma, e., verduyn, c., scheffers, w. a. & van dijken, j. p. enzymic analysis of the crabtree effect in glucose-limited chemostat cultures of saccharomyces cerevisiae. appl. environ. microbiol. , – ( ). . mashego, m. r., van gulik, w. m., vinke, j. l. & heijnen, j. j. critical evaluation of sampling techniques for residual glucose determination in carbon-limited chemostat culture .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ofsaccharomyces cerevisiae. biotechnol. bioeng. , – ( ). . boender, l. g. m., de hulster, e. a. f., van maris, a. j. a., daran-lapujade, p. a. s. & pronk, j. t. quantitative physiology of saccharomyces cerevisiae at near-zero specific growth rates. appl. environ. microbiol. , – ( ). . koren, s. et al. canu: scalable and accurate long-read assembly via adaptive κ-mer weighting and repeat separation. genome res. , – ( ). . kolmogorov, m., yuan, j., lin, y. & pevzner, p. a. assembly of long, error-prone reads using repeat graphs. nat. biotechnol. , – ( ). . walker, b. j. et al. pilon : an integrated tool for comprehensive microbial variant detection and genome assembly improvement. plos one , ( ). . palmer, j. & stajich, j. funannotate. ( ). doi: . /zenodo. . jones, p. et al. interproscan : genome-scale protein function classification. bioinformatics , – ( ). . bankevich, a. et al. spades: a new genome assembly algorithm and its applications to single-cell sequencing. j. comput. biol. , – ( ). . grabherr, m. g. et al. full-length transcriptome assembly from rna-seq data without a reference genome. nat. biotechnol. , – ( ). . langmead, b., trapnell, c., pop, m. & salzberg, s. l. ultrafast and memory-efficient alignment of short dna sequences to the human genome. genome biol. , r ( ). . li, h. et al. the sequence alignment/map format and samtools. bioinformatics , – ( ). . liao, y., smyth, g. k. & shi, w. featurecounts: an efficient general purpose program for assigning sequence reads to genomic features. bioinformatics , – ( ). . mccarthy, d. j., chen, y. & smyth, g. k. differential expression analysis of multifactor rna-seq experiments with respect to biological variation. nucleic acids res. , – ( ). . robinson, m. d. & oshlack, a. a scaling normalization method for differential expression analysis of rna-seq data. genome biol. , ( ). . gu, z., eils, r. & schlesner, m. complex heatmaps reveal patterns and correlations in multidimensional genomic data. bioinformatics , – ( ). . r core team. r: a language and environment for statistical computing. ( ). . juergens, h. et al. evaluation of a novel cloud-based software platform for structured experiment design and linked data analytics. sci. data , – ( ). . ortiz-merino, r. a. et al. ploidy variation in kluyveromyces marxianus separates dairy and non- dairy isolates. front. genet. , – ( ). . li, h. & durbin, r. fast and accurate short read alignment with burrows-wheeler transform. bioinformatics , – ( ). . auwera, g. a. et al. from fastq data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. curr. protoc. bioinforma. , . . - . . ( ). . cingolani, p. et al. a program for annotating and predicting the effects of single nucleotide .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w ; iso- ; iso- . fly (austin). , – ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / description of additional supplementary files supplementary data set | overview of flow cytometry samples with meta-data. meta-data table of file names, frequency of cells compared to parent, number of cells in each group, strain name, time point of fluorescence measurement after hours ( ) or hours ( ), staining of cells with propidium- iodide (pi) with value (pi) or without pi staining (-), staining of cells with tween nbd-cholesterol (tn) or with tween only (t), with species names abbreviated k. marxianus (km) or s. cerevisiae (sc). [example picture of file flowcyto_table.xlsx] supplementary data set | flow cytometry non-gated data of fl -a versus fl -a of all samples. flow cytometry data of showing fluorescent nbdc uptake by k. marxianus, s. cerevisiae strains with for each sample the intensity of counts (pseudo-colored) for / nm (fl ) for nbdc and > nm (fl ) for pi. [example of first row of flowcyto_fl _fl .pdf] filename strain time point pi # day staining cells/pi-ne cells/pi-po cells/pi-ne a cbs _t_a_pi_ .fcs cbs pi a t b cbs _t_b_pi_ .fcs cbs pi b t a imx _t_a___ .fcs imx - a t .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplemental material for: engineering the thermotolerant industrial yeast kluyveromyces marxianus for anaerobic growth wijbrand j. c. dekker, raúl a. ortiz-merino, astrid kaljouw, julius battjes, frank wiering, christiaan mooiman, pilar de la torre, and jack t. pronk* department of biotechnology, delft university of technology, van der maasweg , hz delft, the netherlands *corresponding author: department of biotechnology, delft university of technology, van der maasweg , hz delft, the netherlands, e-mail: j.t.pronk@tudelft.nl, tel: + . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:j.t.pronk@tudelft.nl https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | ethanol evaporation rate. ethanol concentration over time with reactor volume of ml sm glucose urea media maintained at °c, stirred with rpm and aerated with a volumetric gas flow rate of ml·min- . the reactor off-gas was cooled by passing through a condenser cooled at °c. circles and orange line represent the condition with sparge aeration and tween (t) media supplementation, diamonds and blue line head-space aeration with tween , triangle and red line represent head space aeration and tween omission. data represent mean with standard deviation from three independent reactor experiments. agf aeration type ethanol evaporation (mmol·h- ) t sparge . ± . t head-space . ± . head-space . ± . c e th an ol (m m ) time (h) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | consensus biological process go term enrichment for k. marxianus contrast . go terms are clustered according to their rank. see legend of fig. for experimental details. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | consensus biological process go term enrichment for k. marxianus contrast . go terms are clustered according to their rank. see legend of fig. for experimental details. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | consensus biological process go term enrichment for s. cerevisiae contrast . go terms are clustered according to their rank. see legend of fig. for experimental details. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | consensus biological process go term enrichment for s. cerevisiae contrast . go terms are clustered according to their rank. see legend of fig. for experimental details. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | go term enrichment comparison of biological process of k. marxianus (kmar) to s. cerevisiae (scer) of contrast . go terms were annotated with the color of distinct directionality (up (blue) down (brown)) and the color intensity was determined by the magnitude of the inverse rank. go terms with significant mixed-directionality or non-directionality, as having no pronounced distinct directionality, are colored white. shared go terms between k. marxianus and s. cerevisiae are connected by a line. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | uptake of the fluorescent sterol derivative nbdc by s. cerevisiae and k. marxianus strains after h staining. flow cytometry data of fig. with prolonged staining after pulse-addition of nbd-cholesterol to the shake-flask cultures for h. bar charts of the median and pooled standard deviation of the nbd- cholesterol fluorescence intensity of pi-negative cells with pooled variance from the biological replicate cultures. see legend fig. for experimental details. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | flow cytometry gating strategy of both k. marxianus (left panel) and s. cerevisiae (right panel) samples. gates were set per one species for all samples independent of nbdc staining. density of events were calculated by flowjo software and represented in pseudo-color (blue low density, red high-density). the gate between pi-negative and pi-positive was inside the “cells” gated-population. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | cross-validation of oxygen-limited and anaerobic growth of k. marxianus imx . strains were grown in shake-flask cultures in an oxygen-limited (a) and strict anaerobic environment (b). to perform cross-validation between the two parallel running experiments, . ml aliquot of each culture was sealed and transferred quickly between anaerobic chambers and used to inoculate two shake-flask cultures, represented with crossed-arrows (⤮). the cultures from the strain nbrc (⤮) in the third transfer (c ) in the strict anaerobic environment (b) were hence inoculated from an aliquot of the cultures of nbrc (c ) grown in oxygen-limited environment (a). this resulted in a serial transfer of . times dilution from transfer c to c . aerobic grown pre-cultures were used to inoculate the first anaerobic culture on smg-urea containing g·l- glucose and tween . data depicted are of each replicate culture (points) and the mean (dotted line) from independent biological duplicate cultures, serial transfers cultures are represented with the number of respective transfer (c - ) . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | sterol-independent anaerobic growth of s. cerevisiae imx (reference), imx (ttstc ), k. marxianus nbrc (reference) and imx (ttstc ). aerobic grown pre- cultures were used to inoculate shake-flask cultures with smg-urea containing g·l- glucose and tween in a strict anaerobic environment at an od of . for all strains, and both at od of . and . for nbrc and imx . data depicted are of each replicate culture (points) and the mean (dotted line) from independent biological duplicate cultures, serial transfers cultures are represented with the number of respective transfer (c - ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary fig. | co fraction in the off-gas of k. marxianus ims . production of co as measured by the fraction of co in the off-gas of the individual bioreactor cultivations of the k. marxianus strain ims on smg media ph . with g·l- glucose, mg·l- tween over time (left panels). the temperature profile was incrementally increased at the beginning of a new batch cycle (right panels). after h the performance of the off-gas analyzer of replicate m r deteriorated. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary table | mutations identified by whole-genome sequencing in comparison to the reference k. marxianus strain imx . overview of mutations detected in the strains after selected for strict anaerobic growth ims , ims , ims , ims compared to the ttstc engineered strain (imx ). resequencing of ims after transfers in strict anaerobic conditions is for clarity referred with the strain name ims . overview of mutations of the bioreactor populations after prolonged selection for anaerobic growth at elevated temperatures, represented by the bioreactor replicates (m r, m r, and m l). mutations in coding regions are annotated as synonymous (syn), non- synonymous (nsy), insertion or deletions. mutations in non-coding regions are reported with the identifier of the neighboring gene, directionality and strand (+/-). for k. marxianus genes, corresponding s. cerevisiae orthologs with the s c identifier are listed if applicable. qd refers to quality by depth calculated by gatk and genotyping overviews are given per strain using the gatk fields gt: / for homozygous alternative, / for heterozygous, ad: allelic depth (number of reads per reference and alternative alleles called), dp: approximate read depth at the corresponding genomic position, and gq: genotype quality. na indicates variants were not called in that position in the corresponding strain. chro mos ome po siti on descri ption type kmar id s csy stid g e n e q d im x ims ims ims ims ims m r m r m l mutation spectra of imx derived single isolates after selection for strict anaerobic growth asp- - asp cds:(s yn) tpuv _ ydr c g cn na / : , : : na na na / : , : : / : , : : / : , : : / : , : : codon: tca cds:in serti on[ ] tpuv _ tran spos on na / : , : : na / : , : : / : , : : / : , : : / : , : : / : , : : / : , : : trp- - stp cds:(n on) tpuv _ yal c cl n na / : , : : na na na / : , : : / : , : : / : , : : / : , : : tpuv _ -t p utr :+ tpuv _ ygr w pt i na na / : , : : / : , : : / : , : : / : , : : / : , : : / : , : : na tpuv _ -t p utr :- tpuv _ ybr c ss h na na / : , : : na na / : , : : na na na utp p utr :+ tpuv _ ygr w u tp na / : , : : na na / : , : : / : , : : na na na mutations in whole populations after selection for anaerobic growth at elevated temperatures .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / codon: aat cds:d eletio n[- ] tpuv _ ylr w lu g na na na na na na na na / : , : : codon: cag cds:in serti on[ ] tpuv _ no similarity na na na na na na na na / : , : : .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract results k. marxianus and s. cerevisiae show different physiological responses to extreme oxygen limitation transcriptional responses of k. marxianus to oxygen limitation involve ergosterol metabolism absence of sterol import in k. marxianus engineering k. marxianus for oxygen-independent growth test of anaerobic thermotolerance and selection for fast growing anaerobes discussion online methods yeast strains, maintenance and shake-flask cultivation expression cassette and plasmid construction strain construction chemostat cultivation metabolite analysis gas analysis ethanol evaporation rate lipid extractions & gc analysis sterol uptake assay long read sequencing, assembly, and annotation illumina sequencing rna isolation, sequencing and transcriptome analysis anaerobic growth experiments laboratory evolution in low oxygen atmosphere laboratory evolution in sequential batch reactors statistics data availability code availability author’s contributions acknowledgements competing interest funding references description of additional supplementary files reporting summary supplemental material for: the scfmet ubiquitin ligase senses cellular redox state to regulate the transcription of sulfur metabolism gene the scfmet ubiquitin ligase senses cellular redox state to regulate the transcription of sulfur metabolism genes zane johnson , yun wang , benjamin m. sutter , benjamin p. tu * department of biochemistry, university of texas southwestern medical center, dallas, tx - *correspondence and lead contact: benjamin.tu@utsouthwestern.edu .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / summary in yeast, control of sulfur amino acid metabolism relies upon met , a transcription factor which activates the expression of a network of enzymes responsible for the biosynthesis of cysteine and methionine. in times of sulfur abundance, the activity of met is repressed via ubiquitination by the scfmet e ubiquitin ligase, but the mechanism by which the f-box protein met senses sulfur status to tune its e ligase activity remains unresolved. here, using a combination of genetics and biochemistry, we show that met utilizes exquisitely redox-sensitive cysteine residues in its wd- repeat region to sense the availability of sulfur metabolites in the cell. oxidation of these cysteine residues in response to sulfur starvation inhibits binding and ubiquitination of met , leading to induction of sulfur metabolism genes. our findings reveal how scfmet dynamically senses redox cues to regulate synthesis of these special amino acids, and further highlight the mechanistic diversity in e ligase-substrate relationships. introduction the biosynthesis of sulfur-containing amino acids supplies cells with increased levels of cysteine and methionine, as well as their downstream metabolites glutathione and s-adenosylmethionine (sam). glutathione serves as a redox buffer to maintain the reducing environment of the cell and provide protection against oxidative stress, while sam serves as the methyl donor for nearly all methyltransferase enzymes (ljungdahl and daignan-fornier, , cantoni, ). in the yeast saccharomyces cerevisiae, biosynthesis of all sulfur metabolites can be performed de novo via enzymes encoded in the gene transcriptional network known as the met regulon. activation of the met gene transcriptional program under conditions of sulfur starvation relies on the transcription factor met and additional transcriptional co-activators that allow met to be recruited to the met genes (kuras et al., , blaiseau and thomas, ). when yeast cells sense sufficiently high levels of sulfur in the environment, the met gene transcriptional program is negatively regulated by the activity of the scf e ligase met (scfmet ) through ubiquitination of the master transcription factor met (kaiser et al., ). met is unique as an e ligase substrate as it contains an internal ubiquitin interacting motif (uim) which folds in and caps the growing ubiquitin chain generated by scfmet , resulting in a proteolytically stable but transcriptionally inactive oligo-ubiquitinated state (flick et al., ). upon sulfur starvation, scfmet ceases to ubiquitinate met , allowing met to become deubiquitinated and transcriptionally active. since its discovery, much effort has gone into understanding how met senses the sulfur status of the cell. several mechanisms have been attributed to met to describe how met and itself work together to regulate levels of met gene transcripts in response to the availability of sulfur or the presence of toxic heavy metals (thomas et al., ). after the discovery that met is an e .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ligase that negatively regulates met through ubiquitin-dependent and both proteolysis-dependent and independent mechanisms (rouillon et al., , flick et al., , kuras et al., ), it was found that met dissociates from scf complexes upon cadmium addition, resulting in the disruption of the aforementioned ubiquitin-dependent regulatory mechanisms (barbey et al., ). it was later reported that this cadmium-specific dissociation of met from scf complexes is mediated by the cdc /p aaa+ atpase complex, and that met ubiquitination is required for cdc to strip met from these complexes (yen et al., ). in parallel, attempts to identify the sulfur metabolic cue sensed by met suggested that cysteine, or possibly some downstream metabolite, was required for the degradation of met by scfmet , although glutathione was reportedly not involved in this mechanism (hansen and johannesen, , menant et al., ). a genetic screen for mutants that fail to repress met gene expression found that cho d cells, which are defective in the synthesis of phosphatidylcholine (pc) from phosphatidylethanolamine (pe), results in elevated sam levels and deficiency in cysteine levels (sadhu et al., ). however, while met and met have been studied extensively for over two decades, the biochemical mechanisms by which met senses and responds to the presence or absence of sulfur remains incomplete (sadhu et al., ). herein, we utilize prototrophic yeast strains grown in sulfur-rich and sulfur-free respiratory conditions to elucidate the mechanism by which met senses sulfur. using a combination of in vivo and in vitro experiments, we find that instead of sensing any single sulfur-containing metabolite, met indirectly senses the levels of sulfur metabolites in the cell by acting as a sensor of redox state. we describe a novel mechanism by which an f-box protein can be regulated through the use of multiple cysteine residues as redox sensors that, upon oxidation, disrupt binding of the e ligase to its target to enable the activation of a coordinated transcriptional response. results synthesis of cysteine is more important than methionine for met ubiquitination previous work in our lab has characterized the metabolic and cellular response of yeast cells following switch from rich lactate media (ypl) to minimal lactate media (sl) (wu and tu, , sutter et al., , laxman et al., , kato et al., , yang et al., , ye et al., , ye et al., ). under such respiratory conditions, yeast cells engage regulatory mechanisms that might otherwise be subject to glucose repression. among other phenotypes, this switch results in the acute depletion of sulfur metabolites and the activation of the met gene regulon (sutter et al., , ye et al., ). to better study the response of yeast cells to sulfur starvation, we reformulated our minimal lactate media to contain no sulfate, as prototrophic yeast can assimilate sulfur in the form of inorganic sulfate into reduced sulfur metabolites. after switching cells from yp lactate media (rich) to the new minimal sulfur-free lactate media (−sulfur), we found that .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / met and met quickly respond to sulfur starvation through the extensively studied ubiquitin- dependent mechanisms regulating met activity (figure a) (yen et al., , flick et al., , barbey et al., , kaiser et al., , flick et al., ). as previously observed, the deubiquitination of met resulted in the activation of the met genes (figure b) and corresponded well with changes in observed sulfur metabolite levels (figure c). addition of sulfur metabolites quickly rescued met activity and resulted in the re-ubiquitination of met and the repression of the met genes. as previously noted, met activation in response to sulfur starvation results in the emergence of a second, faster-migrating proteoform of met , which disappears after rescuing yeast cells with sulfur metabolites (sadhu et al., ). we found that the appearance of this proteoform is dependent on both met and new translation, as it was not observed in either met d cells or cells treated with cycloheximide during sulfur starvation (figure s a and c). additionally, this proteoform persists after rescue with a sulfur source in the presence of a proteasome inhibitor (figure s b). we hypothesized that this faster-migrating proteoform of met might be the result of translation initiation at an internal methionine residue. in support of this possibility, mutation of methionine residues , , and to alanine blocked the appearance of a lower form during sulfur starvation (figure s d). conversely, deletion of the first amino acids containing the first three methionine residues of met resulted in expression of a met proteoform that migrated at the apparent molecular weight of the wild type short form and did not generate a new, even-faster migrating proteoform under sulfur starvation (figure s d). moreover, the met m / / a and met d - strains expressing either solely the long or short form of the met protein had no obvious phenotype with respect to met ubiquitination or growth in high or low sulfur media (figure s e). we conclude that the faster-migrating proteoform of met that is produced during sulfur starvation has no discernible effect on sulfur metabolic regulation under these conditions. the sulfur amino acid biosynthetic pathway is bifurcated into two branches at the central metabolite homocysteine, where this precursor metabolite commits either to the production of cysteine or methionine (figure e). after confirming met and met were responding to sulfur starvation as expected, we sought to determine whether the cysteine or methionine branch of the sulfur metabolic pathway was sufficient to rescue met e ligase activity and re-ubiquitinate met after sulfur starvation. to determine whether the synthesis of methionine is necessary to rescue met activity, cells lacking methionine synthase (met d) were fed either homocysteine or methionine after switching to sulfur-free lactate (−sulfur) media. interestingly, cells fed homocysteine were still able to ubiquitinate and degrade met , while methionine-fed cells appeared to oligo-ubiquitinate and stabilize met (figure d). these observations are consistent with previous reports and suggest met and met interpret sulfur sufficiency through both branches of sulfur metabolism to a degree (hansen and johannesen, , kaiser et al., , .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / kuras et al., , flick et al., , menant et al., , sadhu et al., ), with the stability of met , but not the e ligase activity of met , apparently dependent on the methionine branch. to determine whether met specifically responds to cysteine, cells lacking cystathionine beta- lyase (str d), the enzyme responsible for the conversion of cystathionine to homocysteine, were starved of sulfur and fed either cysteine or methionine. this mutant is incapable of synthesizing methionine from cysteine via the two-step conversion of cysteine into the common precursor metabolite homocysteine. our results show cysteine was able to rescue met activity even in a str d mutant, further suggesting cysteine or a downstream metabolite, and not methionine, as the signal of sulfur sufficiency for met (figure d). cysteine residues in met are oxidized during sulfur starvation the synthesis of cysteine from homocysteine contributes to the production of the downstream tripeptide metabolite glutathione (gsh), which exists at millimolar concentrations in cells and is the major cellular reductant for buffering against oxidative stress (cuozzo and kaiser, , wu et al., ). specifically, glutathione serves to neutralize reactive oxygen species such as peroxides and free radicals, detoxify heavy metals, and preserve the reduced state of protein thiols (pompella et al., , penninckx, ). considering the relatively high number of cysteine residues in met (figure a), we sought to determine if these residues might become oxidized during acute sulfur starvation. utilizing the thiol-modifying agent methoxy-peg-maleimide (mpeg k-mal), which adds ~ kda per reduced cysteine residue, we assessed met cysteine oxidation in vivo by western blot. theoretically, full modification of the cysteines in met by mpeg k-mal should significantly shift the apparent molecular weight of met by ~ - kda. as expected, met in sulfur-replete rich media migrates at ~ kda (figure b, first lane), nicely corresponding to the modification of most if not all of its cysteine residues, suggesting they are all in the reduced state while sulfur levels are high and met is being negatively regulated. however, after shifting into sulfur-free minimal lactate media, met migrates at ~ kda — suggesting the majority of its cysteine residues are rapidly becoming oxidized in vivo following acute sulfur starvation (figure b, second and third lane). in contrast, the loading control rpn contains a single cysteine residue, and did not exhibit significant oxidation within the same time period of sulfur starvation. as expected, repletion of sulfur metabolites led to the reduction and modification of met ’s cysteine residues by mpeg k-mal to the extent seen in the rich media condition. such oxidation and re-reduction of met cysteines corresponds well with met ubiquitination status (figure b). additionally, when cells were grown in sulfur-free media containing glucose (sfd) as the carbon source, met also becomes oxidized, although on a slower timescale — suggesting this mechanism is not specific to yeast grown under non- fermentable conditions (figure c). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / considering the link between sulfur starvation and oxidative stress, we next assessed whether simply changing the redox state of sulfur-starved cells could mimic sulfur repletion with respect to met e ligase activity. addition of the potent, membrane-permeable reducing agent dtt to yeast cells starved of sulfur readily reversed met cysteine oxidation. dtt also resulted in the partial re-ubiquitination of met , suggesting that met cysteine redox status influences its ubiquitination activity against met (figure d). taken together, these data strongly suggest cysteine residues within met are poised to become rapidly oxidized in response to sulfur starvation, which is correlated with the deubiquitination of its substrate met . met cysteine point mutants exhibit dysregulated sulfur sensing in vivo after establishing met cysteine redox status as an important factor in sensing sulfur starvation, we sought to determine whether specific residues played key roles in the sensing mechanism. through site-directed mutagenesis of met cysteines individually and in clusters (figure s a and b), we observed that mutation of cysteines in the wd- repeat regions of met with the highest concentration of cysteine residues (wd- repeat regions and ) resulted in dysregulated met ubiquitination status (figure a) and met gene expression (figure b). specifically, conservatively mutating these cysteines to serine residues mimics the reduced state of the met protein, resulting in constitutive ubiquitination of met by met even when cells are starved of sulfur. the mixed population of ubiquitinated and deubiquitinated met in the mutant strains resulted in reduced induction of sam and gsh , while met appears to be upregulated in the mutants but is largely insensitive to the changes in the sulfur status of the cell. interestingly, a single cysteine to serine mutant, c s, phenocopies the grouped cysteine to serine mutants c / / / s (data not shown) and c / / / s. these mutants also exhibit slight growth phenotypes when cultured in both rich and −sulfur lactate media supplemented with homocysteine (figure c). furthermore, these point mutants only effect met ubiquitination in the context of sulfur starvation, as strains expressing these mutants exhibited a normal response to cadmium as evidenced by rapid deubiquitination of met (figure s c). met cysteine oxidation disrupts ubiquitination and binding of met in vitro having observed that met cysteine redox status is correlated with met ubiquitination status in vivo, we next sought to determine whether the sulfur/redox-sensing ability of scfmet e ligase activity could be reconstituted in vitro. to this end, we performed large scale immuno-purifications of scfmet -flag to pull down met and its interacting partners in both high and low sulfur conditions for in vitro ubiquitination assays with recombinantly purified e , e , and met (figure a). initial in vitro ubiquitination experiments showed little difference in activity between the two .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / conditions, mirroring prior efforts to demonstrate differential activity of the met e ligase in response to stimuli that effect its activity in vivo (figure s a) (barbey et al., ). since the cysteine residues within met became rapidly oxidized in sulfur-free conditions, the addition of dtt as a standard component in our ip buffer and in in vitro ubiquitination reactions could potentially reduce oxidized met cysteines and alter its ubiquitination activity towards met . to test this possibility, we next performed the met ip and in vitro assay in the complete absence of reducing agent. strikingly, we observed little to no ubiquitination activity in these conditions (fig. s b), suggesting that oxidized met exhibits significantly reduced ubiquitination activity. to more rigorously test the effect of reducing agents on the activity of immunopurified scfmet , we performed in parallel the met -flag ip with cells grown in both high and low sulfur conditions, with and without reducing agent in the ip. silver stains of the eluted co-ip met complexes showed similar levels of total protein overall and little difference in the abundance of major binding partners between the four conditions (figure s c). western blots of the co-ip samples for the cdc /cullin scaffold showed similar binding between the samples with the exception of the −sulfur, −dtt sample which had approximately a third of the amount of cdc bound to met (figure s d). we suspect this difference is due to the canonical regulation of scf e ligases, which uses cyclic changes in the affinity of skp /f-box protein heterodimers to the cullin scaffold based on binding between the f-box protein and its substrate (reitsma et al., ). after performing the initial ip and washing the beads in buffer with and without reducing agent, the final wash step and flag peptide elution were done without reducing agent in the buffer for all four ip conditions in order to remove any residual reducing agent from the final ubiquitination reaction, which was also performed without reducing agent. a small aliquot of the rich and −sulfur “−dtt” immunopurified scfmet was transferred to a new tube and treated with mm tcep, a non-thiol, phosphine-based reducing agent, for approximately min while the in vitro ubiquitination assays were set up to test if the low activity of the oxidized scfmet complex could be rescued by treating with another reducing agent before addition to the final reaction. the data clearly demonstrate that the presence of reducing agent in the ip and wash buffer, but not in the elution or final reaction, significantly increased the e ligase activity of scfmet in vitro regardless of whether the cells were grown in high (figure c) or low sulfur media (figure d). further supporting our hypothesis, brief treatment of the oxidized −dtt ip complex with tcep (−dtt/+tcep) rescued the activity of the e complex in vitro (figures b and c). the same +/ − dtt in vitro ubiquitination experiment done with the c s and c / / / s met mutants showed lower e ligase activity overall relative to wild type met , but smaller differences between the plus and minus reducing agent condition (figure s a). as scfmet e ligase activity in vitro is independent of the sulfur-replete or -starved state of the cells from which the co-ip concentrate is produced, and that the activity of the scfmet co-ip .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / concentrate purified in the absence of reducing agent can be rescued by treatment with another reducing agent, we hypothesized that the low e ligase activity of scfmet purified in the absence of reducing agent is due to decreased binding between met and met , and not decreased binding between met and the other core scf components. to test this possibility, lysate for “rich” and “−sulfur” cells was prepared and each was split into three groups, with either reducing agent (+dtt), the thiol-specific oxidizing agent tetramethylazodicarboxamide (+diamide), or control (−dtt) (figure a). met -flag ips were performed as previously described for the in vitro ubiquitination assay, except instead of eluting met off of the beads, the +dtt, −dtt, and +diamide beads were each split into two tubes containing ip buffer ±dtt and bacterially purified met . the beads were incubated with purified met prior to washing with ip buffer with or without dtt. we observed a clear, dtt-dependent increase in the fraction of met bound to the met - flag beads, with the “+dtt” met ip showing a larger initial amount of bound met compared to the “−dtt” met ip, with even less met bound to the “+diamide” met -flag beads. consistent with our hypothesis, the addition of dtt to the met co-ip with “−dtt” or “+diamide” met -flag beads restored the met /met interaction to the degree seen in the “+dtt” met -flag beads. we then performed the same experiment with our met cysteine point mutants. the amount of met bound to these mutants was less sensitive to the presence or absence of reducing agent (figure s b). collectively, these data suggest that the reduced form of key cysteine residues in met enables it to engage its met substrate and facilitate ubiquitination. discussion the unique redox chemistry offered by sulfur and sulfur-containing metabolites renders many of the biochemical reactions required for life possible. the ability to carefully regulate the levels of these sulfur-containing metabolites is of critical importance to cells as evidenced by an exquisite sulfur-sparing response. sulfur starvation induces the transcription of met genes and specific isozymes, which themselves contain few methionine and cysteine residues (fauchon et al., ). furthermore, along with the dedicated cell cycle f-box protein cdc , met is the only other essential f-box protein in yeast, linking sulfur metabolite levels to cell cycle progression (su et al., , su et al., ). our findings highlight the intimate relationship between sulfur metabolism and redox chemistry in cellular biology, revealing that the key sensor of sulfur metabolite levels in yeast, met , is regulated by reversible cysteine oxidation. such oxidation of met cysteines in turn influences the ubiquitination status and transcriptional activity of the master sulfur metabolism transcription factor met . while much work has been done to characterize the molecular basis of sulfur metabolic regulation in yeast between met and met , this work describes the biochemical basis for sulfur sensing by the met e ligase (figure ). the ability of met to act as a cysteine redox-responsive e ligase is unique in saccharomyces cerevisiae, but is reminiscent of the redox-responsive keap e ligase in humans. in humans, keap ubiquitinates and degrades its nrf substrate to regulate the cellular response to oxidative .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / stress. when cells are exposed to electrophilic metabolites or oxidative stress, key cysteine residues are either alkylated or oxidized into disulfides, resulting in conformational changes that, in turn, either disrupt keap association with cul or nrf , both leading to nrf activation (yamamoto et al., ). our data suggest that in response to sulfur starvation, met can still maintain its association with the scf e ligase cullin scaffold, but that treatment of the oxidized complex with reducing agent is sufficient to stimulate ubiquitination of met in vitro. this, along with the in vivo and in vitro met cysteine point mutant data, leads us to conclude that it is the ability of met to bind its substrate met that is being disrupted by cysteine oxidation. previous work on the yeast response to cadmium toxicity demonstrated that met is stripped from scf complexes by the p /cdc segregase upon treatment with cadmium, suggesting that like keap , met can utilize both dissociation from scf complexes and disrupted interaction with met to modulate met transcriptional activation (barbey et al., , yen et al., ). recent work on the sensing of oxidative stress by keap has found that multiple cysteines in keap can act cooperatively to form disulfides, and that the use of multiples cysteines to form different disulfide bridges creates an “elaborate fail-safe mechanism” to sense oxidative stress (suzuki et al., ). in light of our findings, we suspect met might similarly use multiple cysteine residues in a cooperative disulfide formation mechanism to disrupt the binding interface between met and met , but more work will be needed to demonstrate this definitively. it is worth noting the curious spacing and clustering of cysteine residues in met , with the highest density and closest spacing of cysteines found in two wd- repeats that are expected to be directly across from each other in the d structure (figure a). that the mutation of these cysteine clusters to serine have the largest in vivo effect, but mutation of any one cysteine to serine (with the notable exception of cys ) has no effect, implies some built-in redundancy in the cysteine-based redox-sensing mechanism (figure s b). we speculate that the oxidation of the cysteines in the wd- repeat region of met work cooperatively to produce structural changes that position cys to make a key disulfide linkage that disrupts the interaction with met . it was previously hypothesized that an observed, faster-migrating proteoform of met might be involved in the regulation of sulfur metabolism (sadhu et al., ). we deduced that the lower form of met does appear to be the result of transcriptionally-guided, alternative translational initiation. however, this faster-migrating proteoform appears dispensable for sulfur metabolic regulation under the conditions we examined. it is curious that such an ostensibly obvious feedback loop between met and met would appear to have little to no effect on sulfur metabolic regulation. however, during sulfur starvation, a decrease in global translation coincides with an increase in ribosomes containing one, instead of two, methyl groups at universally conserved, tandem adenosines near the ’end of s rrna (liu et al.) we speculate that these ribosomes might preferentially translate met gene mrnas, as well as preferentially initiate translation at the internal , , and th methionine residues of met . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / the utilization of a redox mechanism for met draws interesting comparisons to the regulation of met via ubiquitination in that both mechanisms are rapid and readily reversible, require no new rna or protein synthesis, and there is no requirement for the consumption of sulfur equivalents so as to spare them for use in met gene translation under conditions of sulfur scarcity. it is also striking that while met contains many cysteine residues, met contains none – which has the consequence that as met cysteines are oxidized, there is no possibility that met can make an intermolecular disulfide linkage that might interfere with its release and recruitment to the promoters of met genes. upon repletion of sulfur metabolites, cellular reducing capacity is restored, and met cysteine reduction couples the regulation of met gene activation to sulfur assimilation, both of which require significant reducing equivalents. lastly, we highlight the observation that nearly all of the met protein becomes rapidly oxidized within min of sulfur starvation, in contrast to other nucleocytosolic proteins (fig. b). bulk levels of oxidized versus reduced glutathione are also minimally changed within this timeframe. these considerations suggest that met is either located in a redox-responsive microenvironment within cells, or that key cysteine residues such as cys are predisposed to becoming oxidized to subsequently inhibit binding and ubiquitination of met . future structural characterization of scfmet in its reduced and oxidized states may reveal the underlying basis of its exquisite sensitivity to, and regulation by, oxidation. nonetheless, along with soxr and oxyr transcription factors in e. coli (imlay, ) the yap transcription factor in yeast (herrero et al., ), and keap in mammalian cells, our studies add the f-box protein met to the exclusive list of bona fide cellular redox sensors that can initiate a transcriptional response. acknowledgments we thank members of the tu lab, deepak nijhawan, hongtao yu, and george demartino for helpful discussions. this work was supported by nih r gm , r gm , and an hhmi-simons faculty scholars award to b.p.t. author contributions this study was conceived by z.j. and b.p.t. b.m.s. performed met cysteine point mutant strain construction, y.w. performed cysteine point mutant cloning and cdc protein purification, and all remaining experiments were directed and performed by z.j. the paper was written by z.j. and b.p.t. and has been approved by all authors. declaration of interests the authors declare no competing interests. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / experimental procedures yeast strains, construction, and growth media the prototrophic cen.pk strain background (van dijken et al., ) was used in all experiments. strains used in this study are listed in table s . gene deletions were carried out using either tetrad dissection or standard pcr-based strategies to amplify resistance cassettes with appropriate flanking sequences, and replacing the target gene by homologous recombination (longtine et al., ). c-terminal epitope tagged strains were similarly made with the pcr-based method to amplify resistance cassettes with flanking sequences. point mutations were made by cloning the gene into the tagging plasmids, making the specific point mutation(s) by pcr, and amplifying and transforming the entire gene locus and resistance markers with appropriate flanking sequences using the lithium acetate method. media used in this study: ypl ( % yeast extract, % peptone and % lactate); sulfur-free glucose and lactate media (sfd/l) media composition is detailed in table s , with glucose or lactate diluted to % each; ypd ( % yeast extract, % peptone and % glucose). whole cell lysate western blot preparation five od units of yeast culture were quenched in % tca for min, pelleted, washed with % etoh, and stored at − °c. cell pellets were resuspended in µl etoh containing mm pmsf and lysed by bead beating. the lysate was separated from beads by inverting the screwcap tubes, puncturing the bottom with a g needle, and spinning the lysate at , xg into an eppendorf for min. beads were washed with µl of etoh and spun again before discarding the bead-containing screwcap tube and pelleting protein extract at , xg for min in the new eppendorf tube. the etoh was aspirated and etoh precipitated protein pellets were resuspended in µl of sample buffer ( mm tris ph . , % sds, % glycerol, . mg/ml bromophenol blue), heated at °c for min, and debris was pelleted at , xg for min. dtt was added to a final concentration of mm and incubated at rt for min before equivalent amounts of protein were loaded onto nupage - % bis-tris or - % tris-acetate gels. for protein samples modified with mpeg k-mal, an aliquot of the sample buffer resuspended protein pellets was moved to a fresh eppendorf and sample buffer containing mm mpeg k-mal was added for a final concentration of mm mpeg k-mal before heating at °c for min, pelleting debris, and adding dtt. western blots western blots were carried out by transferring whole cell lysate extracts or in vitro ubiquitination or binding assay samples onto . micron nitrocellulose membranes and wet transfers were carried out at ma constant for min at °c. membranes were incubated with ponceau s, washed with tbst, blocked with % milk in tbst for h, and incubated with : mouse anti-flag m antibody (sigma, cat#f ), : mouse anti-ha( ca ) (roche, .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ref# ), : , rabbit anti-rpn (abcam, ab ), or : goat anti-cdc (santa cruz, yc- ) in % milk in tbst overnight at °c. after discarding primary antibody, membranes were washed times for min each before incubation with appropriate hrp- conjugated secondary antibody for h in % milk/tbst. membranes were then washed times for min each before incubating with pierce ecl western blotting substrate and exposing to film. rna extraction and real time quantitative pcr (rt-qpcr) analysis rna isolation of five od units of cells under different growth conditions was carried out following the manufacture manual using masterpure yeast rna purification kit (epicentre). rna concentration was determined by absorption spectrometer. μg rna was reverse transcribed to cdna using superscript iii reverse transcriptase from invitrogen. cdna was diluted : and real-time pcr was performed in triplicate with iq sybr green supermix from biorad. transcripts levels of genes were normalized to act . all the primers used in rt-qpcr have efficiency close to %, and their sequences are listed below. act _rt_f tccggtgatggtgttactca act _rt_r ggccaaatcgattctcaaaa met _rt_f cggtttcggtggtgtcttat met _rt_r caacaacttgagcaccagaaag gsh _rt_f caccgatgtggaaactgaaga gsh _rt_r ggcataggattggcgtaaca sam _rt_f cagagggtttgcctttgacta sam _rt_r ctggtctcaaccacgctaaa metabolite extraction and quantitation intracellular metabolites were extracted from yeast using a previous established method (tu et al., ). briefly, at each time point, ~ . od units of cells were rapidly quenched to stop metabolism by addition into . ml quenching buffer containing % methanol and mm tricine, ph . . after holding at - °c for at least min, cells were spun at , xg for min at °c, washed with ml of the same buffer, and then resuspended in ml extraction buffer containing % ethanol and . % formic acid. intracellular metabolites were extracted by incubating at °c for min, followed by incubation at °c for min. samples were spun at , xg for min to pellet cell debris, and . ml of the supernatant was transferred to a new tube. after a second spin at , xg for min, . ml of the supernatant was transferred to a new tube. metabolites in the extraction buffer were dried using speedvac and stored at − °c until analysis. methionine, sam, sah, cysteine, gsh and other cellular metabolites were quantitated by lc-ms/ms with a triple quadrupole mass spectrometer ( qtrap, ab sciex) using previously established methods (tu et al., ). briefly, metabolites were separated chromatographically on a c -based column with polar embedded groups (synergi fusion-rp, . mm micron, phenomenex), using a shimadzu prominence lc /sil- ac hplc- autosampler coupled to the mass spectrometer. flow rate was . ml/min using the following .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / method: buffer a: . % h o/ . % formic acid, buffer b: . % methanol / . % formic acid. t = min, % b; t = min, % b; t = min, % b; t = min, % b; t = min, % b, t = min, % b; t = min, stop. for each metabolite, a mm standard solution was infused into a applied biosystems qtrap triple quadrupole-linear ion trap mass spectrometer for quantitative optimization detection of daughter ions upon collision-induced fragmentation of the parent ion [multiple reaction monitoring (mrm)]. the parent ion mass was scanned for first in positive mode (usually mw + ). for each metabolite, the optimized parameters for quantitation of the two most abundant daughter ions (i.e., two mrms per metabolite) were selected for inclusion in further method development. for running samples, dried extracts (typically . od units) were resuspended in ml . % formic acid, spun at , xg for min at °c, and µl was moved to a fresh eppendorf. the µl was spun again at , xg for min at °c, and µl was moved to mass-spec vials for injection (typically µl injection volume). the retention time for each mrm peak was compared to an appropriate standard. the area under each peak was then quantitated by using analyst® . . , and were re-inspected for accuracy. normalization was done by normalizing total spectral counts of a given metabolite by od units of the sample. data represents the average of two biological replicates. protein purification xhis-uba (e ) was purified as previously described (petroski and deshaies, ), with the exception that the strain was made in the cen.pk background and the his -tag was appended to the n-terminus of uba . additionally, lysis was performed by cryomilling frozen yeast pellets by adding the pellet to a pre-cooled ml milling jar containing a mm stainless steel ball. yeast cell lysis was performed by milling in cycles at hrz for min and chilling in liquid nitrogen for min. lysate was made by adding ml of buffer for every gram of cryomilled yeast powder, and clarification was performed at , xg instead of , xg. cdc - xhis (e ) similarly was purified according to previously described protocols (petroski and deshaies, ), with the following exceptions; the cdc orf was cloned into phis parallel vector such that the n-terminal his tag was eliminated from the vector while incorporating a c-terminal xhis tag by pcr. bl transformants were grown in lb medium and expression was induced by addition of . mm iptg. cells were lysed by sonication and clarification was done by spinning at , xg for min at °c before the ni-nta purification was performed as previously described (petroski and deshaies, ). his-sumo-met -strep-tagii-ha was purified by cloning the met orf into pet his sumo vector while incorporating a c-terminal strep-tagii and a single ha tag by pcr. bl transformants were grown in liters lb medium and induced by addition of . mm iptg o/n at °c at rpm. cell pellets were collected and lysed by sonication in buffer containing mm tris ph . , mm nacl, % glycerol, mm imidazole, mm pmsf, µm leupeptin, mm naf, µm pepstatin, . % np- , and x roche edta-free protease inhibitor cocktail .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / tablet. lysate was clarified by centrifugation at , xg for min at °c and the supernatant was transferred to a ml conical and met was batch purified with . ml of ni-nta agarose by incubating for min at °c. after spinning down the ni-nta agarose, the supernatant was removed and the agarose was resuspended in the same buffer and moved to a gravity flow column and washed times with mm tris ph . , mm nacl, % glycerol, and mm imidazole before elution with the same buffer containing mm imidazole. eluted met was then run over ml of strep-tactin sepharose in a ml gravity flow column, washed with cvs strep-tactin wash buffer ( mm tris ph . , mm nacl), and eluted by diluting ml x strep-tactin elution buffer in ml strep-tactin wash buffer and collecting . ml fractions. fractions containing pure, full-length met were pooled and concentrated while exchanging the buffer with buffer containing mm tris ph . , mm nacl, mm mgcl , % glycerol, and mm dtt. protein concentration was measured and mg/ml aliquots were made and stored at − °c. scfmet -flag ip and in vitro ubiquitination assay strains containing flag-tagged met were grown in rich ypl media overnight to mid-late log phase before dilution with more ypl and grown for h before half of the culture was separated and switched −sulfur sfl media for min. subsequently, approximately od units each of ypl and sfl cultured yeast were spun down and frozen in liquid nitrogen. frozen yeast pellets were cryomilled by adding the pellet to a pre-cooled ml milling jar containing a mm stainless steel ball. yeast cell lysis was performed by milling in cycles at hrz for min and chilling in liquid nitrogen for min. cryomilled yeast powder (~ grams) was moved to a ml conical and resuspended in ml scf ip buffer ( mm tris ph . , mm nacl, mm naf, % np- , mm edta, % glycerol) containing µm leupeptin, mm pmsf, µm pepstatin, µm sodium orthovanadate, mm , -phenanthroline, µm mln , x roche edta-free protease inhibitor cocktail tablet, and mm dtt when specified. small molecule inhibitors of neddylation and deneddylation were included, and along with a short ip time, intended to minimize exchange and preserve f-box protein/skp substrate recognition modules (reitsma et al., ). the lysate was then briefly sonicated to sheer dna and subsequently clarified at , xg for min and the supernatant was incubated with with µl of thermo fisher protein g dynabeads (cat# d) dmp crosslinked to µl of mouse anti-flag m antibody (sigma, cat#f ) for min at °c. the agarose was pelleted at xg for min, the supernatant was aspirated, and the magnetic beads transferred to an eppendorf tube. the beads were washed times with ml scf ip buffer with or without dtt before elution with mg/ml flag peptide in pbs. the eluent was concentrated in amicon ultra- . centrifugal filter units with kda mw cutoffs to a final volume of ~ µl. silver stains of the ips were carried out using the pierce silver stain for mass spectrometry kit (cat# ) according to the manufacturers protocol. the in vitro ubiquitination assay was performed by placing a pcr tube on ice and adding to it µl of water, µl of x ubiquitination assay buffer ( mm tris ph . , mm atp, mm mgcl , % glycerol), . µl uba (fc = nm), . µl cdc (fc = nm), . µl yeast ubiquitin (boston biochem, fc = . µm) and incubating at rt for min. the pcr tubes were then placed back on ice and .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / µl of water, µl of x ubiquitination assay buffer, µl of concentrated scfmet -flag ip, and µl of purified met (fc = nm) were added, the tubes were moved back to rt, and µl aliquots of the reaction were removed, mixed with x sample buffer, and frozen in liquid nitrogen over the time course. scfmet -flag ip and in vitro met binding assay for the met binding assay, yeast cell lysate was prepared as described for the ubiquitination experiment, except that the lysate was split three ways, with mm dtt, mm tetramethylazodicarboxamide (diamide) (sigma, cat#d ), or nothing added to the lysate prior to centrifugation at , xg for min at °c. the supernatant was transferred to new tubes and µl of thermo fisher protein g dynabeads (cat# d) dmp crosslinked to µl of mouse anti-flag m antibody (sigma, cat#f ) was divided evenly between the six met - flag ip conditions and incubated for h at °c while rotating end over end. after incubation, the beads were washed with ip buffer containing mm dtt, mm diamide, or nothing twice before a final wash with plain ip buffer. each set of met -flag bound beads prepared in the different ip conditions was brought up to µl with plain ip buffer, and µl was dispensed to new tubes containing ml of ip buffer ± mm dtt and µg of purified recombinant met , and were incubated for h at °c while rotating end over end for a total of twelve met co-ip conditions. the beads were then collected, washed times with ip buffer ± mm dtt, resuspended in µl x sample buffer, and heated at °c for min before western blotting for both met and met . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / references barbey, r., baudouin-cornu, p., lee, t. a., rouillon, a., zarzov, p., tyers, m. & thomas, d. . inducible dissociation of scf(met ) ubiquitin ligase mediates a rapid transcriptional response to cadmium. embo j, , - . blaiseau, p. l. & thomas, d. . multiple transcriptional activation complexes tether the yeast activator met to dna. embo j, , - . cantoni, g. l. . biological methylation: selected aspects. annu rev biochem, , - . cuozzo, j. w. & kaiser, c. a. . competition between glutathione and protein thiols for disulphide-bond formation. nature cell biology, , - . fauchon, m., lagniel, g., aude, j.-c., lombardia, l., soularue, p., petat, c., marguerie, g., sentenac, a., werner, m. & labarre, j. . sulfur sparing in the yeast proteome in response to sulfur demand. molecular cell, , - . flick, k., ouni, i., wohlschlegel, j. a., capati, c., mcdonald, w. h., yates, j. r. & kaiser, p. . proteolysis-independent regulation of the transcription factor met by a single lys -linked ubiquitin chain. nat cell biol, , - . flick, k., raasi, s., zhang, h., yen, j. l. & kaiser, p. . a ubiquitin-interacting motif protects polyubiquitinated met from degradation by the s proteasome. nat cell biol, , - . hansen, j. & johannesen, p. f. . cysteine is essential for transcriptional regulation of the sulfur assimilation genes in saccharomyces cerevisiae. molecular and general genetics mgg, , - . herrero, e., ros, j., bellÍ, g. & cabiscol, e. . redox control and oxidative stress in yeast cells. biochimica et biophysica acta (bba)-general subjects, , - . imlay, j. a. . the molecular mechanisms and physiological consequences of oxidative stress: lessons from a model bacterium. nature reviews microbiology, , - . kaiser, p., flick, k., wittenberg, c. & reed, s. i. . regulation of transcription by ubiquitination without proteolysis: cdc /scfmet -mediated inactivation of the transcription factor met . cell, , - . kato, m., yang, y. s., sutter, b. m., wang, y., mcknight, s. l. & tu, b. p. . redox state controls phase separation of the yeast ataxin- protein via reversible oxidation of its methionine-rich low-complexity domain. cell, , - e . kuras, l., cherest, h., surdin-kerjan, y. & thomas, d. . a heteromeric complex containing the centromere binding factor and two basic leucine zipper factors, .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / met and met , mediates the transcription activation of yeast sulfur metabolism. embo j, , - . kuras, l., rouillon, a., lee, t., barbey, r., tyers, m. & thomas, d. . dual regulation of the met transcription factor by ubiquitin-dependent degradation and inhibition of promoter recruitment. mol cell, , - . laxman, s., sutter, b. m., wu, x., kumar, s., guo, x., trudgian, d. c., mirzaei, h. & tu, b. p. . sulfur amino acids regulate translational capacity and metabolic homeostasis through modulation of trna thiolation. cell, , - . liu, k., santos, d. a., hussmann, j. a., sutter, b. m., wang, y., weissman, j. s. & tu, b. p. regulation of translation by s rrna methylation multiplicity. ljungdahl, p. o. & daignan-fornier, b. . regulation of amino acid, nucleotide, and phosphate metabolism in saccharomyces cerevisiae. genetics, , - . longtine, m. s., mckenzie, a., rd, demarini, d. j., shah, n. g., wach, a., brachat, a., philippsen, p. & pringle, j. r. . additional modules for versatile and economical pcr-based gene deletion and modification in saccharomyces cerevisiae. yeast, , - . menant, a., baudouin-cornu, p., peyraud, c., tyers, m. & thomas, d. . determinants of the ubiquitin-mediated degradation of the met transcription factor. j biol chem, , - . miller, a. w., befort, c., kerr, e. o. & dunham, m. j. . design and use of multiplexed chemostat arrays. jove (journal of visualized experiments), e . penninckx, m. . a short review on the role of glutathione in the response of yeasts to nutritional, environmental, and oxidative stresses. enzyme microb technol, , - . petroski, m. d. & deshaies, r. j. . in vitro reconstitution of scf substrate ubiquitination with purified proteins. methods enzymol, , - . pompella, a., visvikis, a., paolicchi, a., de tata, v. & casini, a. f. . the changing faces of glutathione, a cellular protagonist. biochem pharmacol, , - . reitsma, j. m., liu, x., reichermeier, k. m., moradian, a., sweredoski, m. j., hess, s. & deshaies, r. j. . composition and regulation of the cellular repertoire of scf ubiquitin ligases. cell, , - . e . rouillon, a., barbey, r., patton, e. e., tyers, m. & thomas, d. . feedback- regulated degradation of the transcriptional activator met is triggered by the scf(met )complex. embo j, , - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / sadhu, m. j., moresco, j. j., zimmer, a. d., yates, j. r., rd & rine, j. . multiple inputs control sulfur-containing amino acid synthesis in saccharomyces cerevisiae. mol biol cell, , - . su, n. y., flick, k. & kaiser, p. . the f-box protein met is required for multiple steps in the budding yeast cell cycle. mol cell biol, , - . su, n. y., ouni, i., papagiannis, c. v. & kaiser, p. . a dominant suppressor mutation of the met cell cycle defect suggests regulation of the saccharomyces cerevisiae met -cbf transcription complex by met . j biol chem, , - . sutter, b. m., wu, x., laxman, s. & tu, b. p. . methionine inhibits autophagy and promotes growth by inducing the sam-responsive methylation of pp a. cell, , - . suzuki, t., muramatsu, a., saito, r., iso, t., shibata, t., kuwata, k., kawaguchi, s. i., iwawaki, t., adachi, s., suda, h., morita, m., uchida, k., baird, l. & yamamoto, m. . molecular mechanism of cellular oxidative stress sensing by keap . cell rep, , - e . thomas, d., kuras, l., barbey, r., cherest, h., blaiseau, p. l. & surdin- kerjan, y. . met p, a yeast transcriptional inhibitor that responds to s- adenosylmethionine, is an essential protein with wd repeats. mol cell biol, , - . tu, b. p., mohler, r. e., liu, j. c., dombek, k. m., young, e. t., synovec, r. e. & mcknight, s. l. . cyclic changes in metabolic state during the life of a yeast cell. proc natl acad sci u s a, , - . van dijken, j. p., bauer, j., brambilla, l., duboc, p., francois, j. m., gancedo, c., giuseppin, m. l., heijnen, j. j., hoare, m., lange, h. c., madden, e. a., niederberger, p., nielsen, j., parrou, j. l., petit, t., porro, d., reuss, m., van riel, n., rizzi, m., steensma, h. y., verrips, c. t., vindelov, j. & pronk, j. t. . an interlaboratory comparison of physiological and genetic properties of four saccharomyces cerevisiae strains. enzyme microb technol, , - . wu, g., fang, y. z., yang, s., lupton, j. r. & turner, n. d. . glutathione metabolism and its implications for health. j nutr, , - . wu, x. & tu, b. p. . selective regulation of autophagy by the iml -npr -npr complex in the absence of nitrogen starvation. mol biol cell, , - . yamamoto, m., kensler, t. w. & motohashi, h. . the keap -nrf system: a thiol-based sensor-effector apparatus for maintaining redox homeostasis. physiol rev, , - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / yang, y. s., kato, m., wu, x., litsios, a., sutter, b. m., wang, y., hsu, c. h., wood, n. e., lemoff, a., mirzaei, h., heinemann, m. & tu, b. p. . yeast ataxin- forms an intracellular condensate required for the inhibition of torc signaling during respiratory growth. cell, , - e . ye, c., sutter, b. m., wang, y., kuang, z. & tu, b. p. . a metabolic function for phospholipid and histone methylation. mol cell, , - e . ye, c., sutter, b. m., wang, y., kuang, z., zhao, x., yu, y. & tu, b. p. . demethylation of the protein phosphatase pp a promotes demethylation of histones to enable their function as a methyl group sink. mol cell, , - e . yen, j. l., flick, k., papagiannis, c. v., mathur, r., tyrrell, a., ouni, i., kaake, r. m., huang, l. & kaiser, p. . signal-induced disassembly of the scf ubiquitin ligase complex by cdc /p . mol cell, , - . yen, j. l., su, n. y. & kaiser, p. . the yeast ubiquitin ligase scfmet regulates heavy metal response. mol biol cell, , - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure legends figure . met and met response to sulfur starvation and repletion under respiratory growth conditions. (a) western blot analysis of a time course performed with yeast containing endogenously tagged met and met that were cultured in rich lactate media (rich) overnight to mid log phase before switching cells to sulfur-free lactate media (−sulfur) for h, followed by the addition of a mix of the sulfur containing metabolites methionine, homocysteine, and cysteine at . mm each (+met/cys/hcy). (b) expression of met gene transcript levels was assessed by qpcr over the time course shown in (a). data are presented as mean and sem of technical triplicates. (c) levels of key sulfur metabolites were measured over the same time course as in (a) and (b), as determined by lc-ms/ms. data represent the mean and sd of two biological replicates. (d) met ∆ or str ∆ strains were grown in “rich” ypl and switched to “−sulfur” sfl for h to induce sulfur starvation before the addition of either . mm homocysteine (+hcy), . mm methionine (+met), or . mm cysteine (+cys). (e) simplified diagram of the sulfur metabolic pathway in yeast. figure . met cysteine residues become oxidized during sulfur starvation. (a) schematic of met protein architecture and cysteine residue location. (b) western blot analysis of met cysteine redox state in lactate media as determined by methoxy-peg-maleimide (mpeg k-mal) modification of reduced protein thiols. for every reduced cysteine thiol in a protein, mpeg k-mal adds ~ kda in apparent molecular weight. (c) same western blot analysis as in (b), except that yeast were cultured in sulfur-free glucose media (sfd) for h before the addition of . mm each of the sulfur metabolites homocysteine, methionine, and cysteine (+met/cys/hcy). (d) yeast were subjected to the same rich to −sulfur media switch as in (b), except that following the min time point, mm dtt was added to the culture for min and met cysteine residue redox state and met ubiquitination were assessed by western blot. figure . met cysteine point mutants display dysregulated sulfur sensing. (a) western blot analysis of met cysteine redox state and met ubiquitination status in wt and two cysteine to serine mutants, c s and c / / / s. (b) met gene transcript levels over the same time course as (a) for the three strains, as assessed by qpcr. data are presented as mean and sem of technical triplicates. (c) growth curves of the three yeast strains used in (a) and (b) in sulfur-rich ypl media or −sulfur sfl media supplemented with . mm homocysteine. cells were grown to mid-log phase in ypl media before pelleting, washing with water, and back-diluting yeast into the two media conditions. data represent mean and sd of technical triplicates. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . met cysteine oxidation disrupts ubiquitination and reduces binding to met in vitro. (a) schematic for the large-scale scfmet -flag immunopurification from rich high sulfur (ypl) and −sulfur (sfl) conditions for use in in vitro ubiquitination or binding assays with recombinant met protein. (b) western blot analysis of met in vitro ubiquitination by scfmet -flag immunopurifications from cells cultured in sulfur-replete rich media. cryomilled ypl yeast powder was divided evenly for two flag ips performed identically with the exception that one was done in the presence of mm dtt (+dtt) and the other was performed without reducing agent present (−dtt). to test if the addition of reducing agent could rescue the activity of the “−dtt” ip, a small aliquot of the “−dtt” scfmet -flag complex was transferred to a new tube and was treated briefly with mm tcep while the in vitro ubiquitination reaction was set up (−dtt/+tcep). the first three lanes are negative control reactions performed either without scfmet -flag ip, recombinant met , or ubiquitin. (c) the same western blot analysis of met in vitro ubiquitination as in (b), except that the scfmet -flag complex was produced from −sulfur sfl cells. (d) western blot analysis of the met binding assay illustrated in (a). rich and −sulfur lysate were both split three ways, and lysate with mm dtt (+dtt), mm diamide (+diamide), or control (−dtt) were incubated with anti-flag magnetic beads to isolate met -flag complex. the met -flag bound beads from each condition were then split in half and distributed into tubes containing ip buffer ± mm dtt and purified recombinant met . the mixture was allowed to incubate for h before the beads were washed, boiled in sample buffer, and bound proteins were separated on sds-page gels and western blots were performed for both met and met . figure . model for sulfur-sensing and met gene regulation by the scfmet e ligase. in conditions of high sulfur metabolite levels, cysteine residues in the wd- repeat region of met are reduced, allowing met to bind and facilitate ubiquitination of met in order to negatively regulate the transcriptional activation of the met regulon. upon sulfur starvation, met cysteine residues become oxidized, resulting in conformational changes in met that allow met to be released from the scfmet complex, deubiquitinated, and transcriptionally active. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supplemental figure legends figure s . characterization of the faster-migrating proteoform of met . (a) western blot of yeast treated with µg/ml cycloheximide during sulfur starvation demonstrates that production of the faster-migrating proteoform is dependent on new translation. (b) the faster-migrating proteoform persists after rescue from sulfur starvation when treated with a proteasome inhibitor. cells were starved of sulfur for h to accumulate the faster-migrating proteoform, and then sulfur metabolites were added back concomitantly with mg ( µm). (c) the faster-migrating proteoform of met is dependent on met . the met ∆ yeast strain does not produce the second proteoform of met when starved of sulfur. (d) western blot analysis of strains expressing either wild type met , met d - aa, or met m / / a. yeast cells harboring the n-terminal deletion of the first twenty amino acids of met (which contain the first three methionine residues) or have the subsequent three methionine residues (m / / ) mutated to alanine do not create faster-migrating proteoforms. (e) met (d - aa) and met (m / / a) strains do not exhibit any growth phenotypes in −sulfur glucose media with or without supplemented methionine. there are also no defects in growth rate following repletion of methionine. data represent mean and sd of technical triplicates. figure s . identification of key cysteine residues in met involved specifically in sulfur amino acid sensing. (a) schematic of met protein architecture and cysteine residue location. (b) western blot analysis of various met cysteine point mutants and met ubiquitination status in rich and −sulfur media. (c) western blot analysis of met cysteine redox state and met ubiquitination status in wt and two cysteine to serine mutants, c s and c / / / s, following treatment with µm cdcl . figure s . scfmet -flag ip/in vitro ubiquitination assay demonstrating the dependence of reducing agent in the ip on scfmet e ligase activity. (a) initial ips for scfmet -flag complex were performed in the presence of mm dtt prior to flag peptide elution and concentration. no dtt was used in the in vitro ubiquitination assay itself, yet the e ligase activities for the e complex were indistinguishable between complex isolated from high sulfur versus low sulfur cells. (b) the same ip/in vitro assay as in (a), with the sole exception that dtt was not included during the ip and wash steps. (c) silver stains of immunopurified scfmet -flag complex isolated from rich and −sulfur cells prepared in the presence or absence of dtt used in figures b and c. (d) western blot of cdc amounts from immunopurified scfmet -flag complex shown in s c and used in figures b and c. we speculate the reduced cdc abundance in the −sulfur, −dtt ip is the result of the canonical regulation of scf e ligases, which causes reduced association .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / between skp /f-box heterodimers to the cdc scaffold when binding between the f-box and its substrate is reduced. figure s . scfmet -flag ip/in vitro ubiquitination assay using met cysteine point mutants (a) in vitro ubiquitination assays were carried out as described in figure b with cell lysate powder from wt, c s, and c / / / s met strains grown in rich media. the heavier loading of the c s mutant is likely due to a difference in cryomill lysis efficiency, and is not a difference in the amount of starting material used. (b) met binding was assessed in the c s and c / / / s mutants as described in figure d using cell lysate powder from cells grown in rich media. the fold change in met binding in the presence and absence of dtt was quantified for each strain and for each met immunopurification condition using imagej (version . ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / table s . strains used in this study. background genotype source cen.pk mata (van dijken et al., ) cen.pk mata (van dijken et al., ) cen.pk mata; met -flag::kanmx this study cen.pk mata; met -flag::kanmx met -ha::hyg this study cen.pk mata; met -flag::kanmx met -ha::hyg met d::nat this study cen.pk mata; met -flag::kanmx met -ha::hyg str d::nat this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c / / / s- flag::kanmx met -ha::hyg this study cen.pk mata; met d::phleo ho::met -flag::nat met -ha::hyg this study cen.pk mata; met d::phleo ho::met daa - - flag::nat met -ha::hyg this study cen.pk mata; met d::phleo ho::met -m / / a- flag::nat met -ha::hyg this study cen.pk mata; met -flag::kanmx met -ha::hyg pdr d::nat this study cen.pk mata; met d::kanmx met -flag::hyg this study cen.pk mata; cup p- xhis-tev-uba ::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c / s-flag::kanmx met -ha::hyg this study cen.pk mata; met ::met -c s-flag::kanmx met -ha::hyg this study .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / table s . recipe for sulfur-free media. salts (g l- ) cacl • h o . nacl . mgcl • h o . nh cl . kh po metals (mg l- ) boric acid . cucl • h o . ki . fecl • h o . mncl • h o . na moo • h o . zncl •h o . vitamins (mg l- ) biotin . calcium pantothenate . folic acid . inositol niacin . -aminobenzoic acid . pyridoxine hcl . riboflavin . thiamine-hcl . recipes are derived from (miller et al., ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / met -ha rich met -flag rpn kda kda kda −sulfur +met/cys/hcy time (min) a lactate (respiratory) kda kda kda met -ha met -flag rpn time (min) rich −sulfur met ∆ str ∆ d lactate (respiratory) +hcy +met rich −sulfur +cys +met r -s -s +m ch +m ch . . r el at iv e ab un da nc e methionine r -s -s +m ch +m ch . r el at iv e ab un da nc e gsh r -s -s +m ch +m ch . r el at iv e ab un da nc e cysteine r -s -s +m ch +m ch . r el at iv e ab un da nc e gssg r -s -s +m ch +m ch . r el at iv e ab un da nc e cystathionine r -s -s +m ch +m ch . . r el at iv e ab un da nc e sam r -s -s +m ch +m ch . r el at iv e ab un da nc e sah c so - homocysteine methionine sam gsh sah cystathionine cysteine met str cys str cys gsh gsh sah sam sam e b figure ub-met -ha ub-met -ha r -s -s +m ch +m ch r el at iv e m r n a e xp re ss io n met r -s -s +m ch +m ch sam r el at iv e m r n a e xp re ss io n r -s -s +m ch +m ch gsh r el at iv e m r n a e xp re ss io n .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / wd wd wd wd wd wd wd wd f-box - - - - - - - - - a.a. scf-binding met -bindinga met -flag b kda kda kda kda kda mpeg k-mal met -flag mpeg k-mal rpn met -ha rich rpn kda −sulfur +met/cys/hcy time (min) lactate (respiratory) met -flag c kda kda kda kda kda mpeg k-mal met -flag mpeg k-mal rpn met -ha +met rpn kda −sulfur +met/cys/hcy time (min) glucose (glycolytic) met -flag d kda kda kda kda kda mpeg k-mal rpn met -ha rich rpn kda −sulfur +dtt time (min) lactate (respiratory) figure ub-met -ha red-met ox-met ub-met -ha red-met ox-met mpeg k-mal met -flag red-met ox-met ub-met -ha ox red .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / met -ha met -flag rpn kda kda kda kda kda kda lactate (respiratory) mpeg k-mal rpn rich −sulfur +met/cys/hcy time (min) wt rich −sulfur +met/cys/hcy c s rich −sulfur +met/cys/hcy c / / / s a r -s -s +m ch +m ch r -s -s +m ch +m ch r -s -s +m ch +m ch r el at iv e m r n a e xp re ss io n met r -s -s +m ch +m ch r -s -s +m ch +m ch r -s -s +m ch +m ch r el at iv e m r n a e xp re ss io n sam r -s -s +m ch +m ch r -s -s +m ch +m ch r -s -s +m ch +m ch gsh r el at iv e m r n a e xp re ss io n wt c s c / / / s b . . . . . . . . . . . time (h) in ypl o d wt c s c / / / s c . . . . . . time (h) in sfl + . mm hcy after switch o d wt c s c / / / s figure ub-met -ha mpeg k-mal met -flag red-met ox-met .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / kda kda kda met -ha met -flag time (min) + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + + flag purification rich scfmet -flag ubiquitin met +dtt −dtt −dtt/ +tcep b kda kda kda met -ha met -flag time (min) + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + − + + + + + + − + + + + + + + + + + flag purification −sulfur scfmet -flag ubiquitin met +dtt −dtt −dtt/ +tcep c a rich −sulfurrich rich switch % of cells to −sulfur media collect and cryomill cell pellets "rich" cell lysate powder "−sulfur" cell lysate powder met ip and in vitro met ubiquitination assay add ip buffer to rich and −sulfur powder split lysate, ip met and scf core components +/− dtt +dtt −dtt +dtt −dtt wash met -bound beads, elute and concentrate the met e complex, and perform in vitro ubiquitination assays with purified e (uba ), e (cdc ), ubiquitin, and met met ip and in vitro met binding assay prepare rich and −sulfur lysate identically as for the ubiquitination experiment split lysate, ip met in the presence of dtt, diamide, or control −dtt+dtt +diamide wash met -bound beads of unbound met , boil beads in sample buffer, and western blot for met to assess binding wash met -bound beads, split each met ip in half, and incubate beads with purified met +/− dtt +/−dtt +/−dtt +/−dtt −dtt+dtt +diamide +/−dtt +/−dtt +/−dtt rich −sulfur met -flag met -ha met -flag ip met -ha co-ip +dtt +dtt −dtt +dtt −dtt −dtt +dtt −dtt +diamide +dtt +dtt −dtt +dtt −dtt −dtt +dtt −dtt +diamideinput rich −sulfur met -ha d figure ub-met -ha ub-met -ha .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / a c dc low sulfur metabolite levels hrt n ub ub skp met met met s——s high sulfur metabolite levels e ub ub ub sh hs met / met genes off c dc hrt n skp met e ub met / met genes on met figure .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / time (min) time (min) met -ha met -flag rpn kda kda − − − + − + − +chx met -ha met -flag rpn kda kda −sulfur+met time (min) met -flag rpn +met −sulfur+met +met wt met ∆ met -ha met -flag rpn a c d e time (h) in +met glucose o d . . . . . time (h) in −sulfur glucose o d +met −sulfur glucose (glycolytic) time (min) − − − + − + − +mg b +met +met glucose (glycolytic) −sulfur glucose (glycolytic) glucose (glycolytic) −sulfur+met +met wt −sulfur+met +met ∆ - −sulfur+met +met m / / a glucose (glycolytic) figure s kda kda . . . . . time (h) in +met glucose after switch from −sulfur glucose ( h) o d wt Δ - m / / a ub-met -ha ub-met -ha .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / wd wd wd wd wd wd wd wd f-box - - - - - - - - - a.a. scf-binding met -bindinga met -flag b kda kda kda kda kda met -ha r rpn kda −s time (min) wt ub-met -ha lactate (respiratory) +dtt r −s c s r −s c s r −s c s r −s c s r −s c s r −s c s r −s c s met -flag met -ha r rpn −s time (min) wt ub-met -ha r −s c s r −s c s r −s c s r −s c s r −s c s r −s c / s r −s c s figure s c met -ha met -flag rpn kda kda kda kda kda kda lactate (respiratory) mpeg k-mal rpn rich +cd time (min) wt rich +cd c s rich +cd c / / / s ub-met -ha mpeg k-mal met -flag red-met ox-met .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / kda kda kda met -ha met -flag time (min) + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + + flag purification +dtt scfmet -flag ubiquitin met rich −sulfur a kda kda kda met -ha met -flag time (min) + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + − + + + + + + + − + + + + + + + flag purification −dtt scfmet -flag ubiquitin met rich −sulfur b c +dtt −dtt +dtt −dtt rich −sulfur kda kda kda kda kda kda kda cdc met skp cdc kda +dtt −dtt rich +dtt −dtt −sulfurd figure s ub-met -ha ub-met -ha .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure s kda kda kda met -ha met -flag time (min) + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + flag purification rich scfmet -flag ubiquitin met +dtt a ub-met -ha +d tt −d tt +d iam id e +d tt −d tt +d iam id e +d tt −d tt +d iam id e m et pu lld ow n (+ d tt /– d tt ) wt c s c / / / s input +dtt −dtt met -ha met -flag met -ha met -flag ip met -ha co-ip wt c s +dtt +dtt −dtt +dtt −dtt −dtt +dtt −dtt +diamide +dtt +dtt −dtt +dtt −dtt −dtt +dtt −dtt +diamide c / / / s +dtt +dtt −dtt +dtt −dtt −dtt +dtt −dtt +diamide wt c s c / / / s −dtt +dtt −dtt +dtt −dtt b .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / partition quantitative assessment (pqa): a quantitative methodology to assess the embedded noise in clustered omics and systems biology data partition quantitative assessment (pqa): a quantitative methodology to assess the embedded noise in clustered omics and systems biology data camacho-hernández, diego a. , †, nieto-caballero, victor e. , †, león-burguete, josé e. , , and freyre-gonzález, julio a. ,* regulatory systems biology research group, laboratory of systems and synthetic biology and undergraduate program in genomic sciences, center for genomic sciences, universidad nacional autónoma de méxico (unam), morelos, mexico. † these authors contributed equally to this work. * corresponding author: jfreyre@ccg.unam.mx abstract: identifying groups that share common features among datasets through clustering analysis is a typical problem in many fields of science, particularly in post-omics and systems biology research. in respect of this, quantifying how a measure can cluster or organize intrinsic groups is important since currently there is no statistical evaluation of how ordered is, or how much noise is embedded in the resulting clustered vector. many of the literature focuses on how well the clustering algorithm orders the data, with several measures regarding external and internal statistical measures; but none measure has been developed to statistically quantify the noise in an arranged vector posterior a clustering algorithm, i.e., how much of the clustering is due to randomness. here, we present a quantitative methodology, based on autocorrelation, to assess this problem. keywords: omics data; hierarchical clustering; noise quantification. . introduction a common task in today’s research is the identification of specific markers, as predictors of a classification yielded in clustering analysis of the data. for instance, this approach is particularly useful after high-throughput experiments to compare gene expression or methylation profiles among different cell lines [ ]. this task is coming handful in the nascent field of single-cell sequencing, leading the important step of clustering cells to further classification or as a qualifying metric of the sequencing process [ ]. regarding the vastly used gene expression assays, the vector of profiles for each marker across different cell lines is recorded using hierarchical clustering algorithms. these algorithms yield a dendrogram and a heat map representing the vector of marker profiles, illustrating the arrangement of the clusters. to assess how well the clustering is segregating different cell lines, a class stating the desired partitioning of each cell line is provided a posteriori. then, a simple visual inspection of the vector of classes is used to estimate whether the clustering is providing a good partition. such partition vector is colored according to the classification that each item is associated with, and it is expected that similar items will be contiguous, so the formed groups are assessed qualitatively on the biological background of each item. this procedure should not be confused with “supervised clustering”, which provides a vector of classes starting the desired partitioning a priori. this is then used to guide the clustering algorithms by allowing the learning of the metric distances that optimizes the partitioning [ ]. additionally, it may get confused with the metric assessment of the clustering algorithms, especially with the external cluster evaluation. for this, various metrics have been developed to qualify the clustering algorithm itself, such as intrinsic and extrinsic measures. these metrics are used for clustering algorithm validation. the extrinsic validation compares the clustering to a goal to say whether it is a good clustering or not. the internal validation compares the elements within the cluster and their differences [ ]. pqa involves characteristics of both kinds of validation, through using both the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / crafted goal standard and the yielded signal itself (clustered vector). however, pqa gathers these elements not qualifying the clustering algorithm itself but to quantify the noise embedded in the cluster, this noise may be due to the intrinsic metric or marker used to order the data set. a possible caveat of the qualitative assessment discussed above is that humans tend to perceive meaningful patterns within random data leading to a cognitive bias known as apophenia [ ]. while interpreting the partitions obtained from unsupervised clustering analysis, researchers attempt to visually assess how close the classifications are to each other finding patterns that are not well supported by the data. such an effect is raised because the adjacency between items may give a notion of the dissimilarity distance in the dendrogram leaves. unfortunately, as much as we know, there is no method to quantitatively assess the quality of the groups of classifications from the clustering or, at least, there is no attempt to quantify whether certain configuration or order of the items may be due to randomness. this is a serious caveat, since the insertion of noise can lead to false conclusion or misleading results. furthermore, the purging of this noise can lead to a more efficient descriptions of markers and its phenomena, accelerating the advance in many fields. in statistics, serial correlation (sc) is a term used to describe the relationship between observations of the same variable over specific periods. it was originally used in engineering to determine how a signal, for instance, a radio wave, varies with itself over time. later, sc was adapted to econometrics to analyze economic data over time principally to predict stock prices and, in other fields, to model-independent random variables [ ]. we applied the sc to propose a manner to quantify how well is the grouping of a posterior classification just by retrieving the results of unsupervised clustering analysis. thus, we propose a novel relative score, pqa, to solve the subjectivity of the visual inspection and to statistically quantify how much noise is embedded in the results of clustering analysis. . methodology . . assigning numeric labels to classifications a vector denoting the putative similarities among the variables in a study is usually obtained after a clustering analysis. each variable is classified to generate a vector of profiles (vp). such a vector of classifications is usually translated into a colors vector, in which each color represents a classification. it is common to inspect this vector to find groups that make sense according to the analyzed data. to the method presented in this work, the vp may be as simple as a vector of strings or numbers that represent the input. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . the pipeline of the pqa methodology. whatever representation of the classifications may be, it is necessary to transform the classifications to a vector of numeric labels, in which a number represents a classification, to be able to calculate sc. to accomplish this, we assign the first numeric label (number ) to the first item in the vector, which usually lays at one of the vector’s extremes. then, if the classification o the next item is different from the previous one, the next number in the sequence is assigned, and so on. this way of labeling warrants that the changes in the sc values are due to the order of numbers, that is to say, the grouping of the classifications resulting from the clustering, and it is not an artifact of the labeling itself (figure ). . . pqa score because the order of the vp could be interpreted as the grouping of the classifications, we measure how well the same classifications are held together in the vp through an sc shifted one position. such sort of correlation is defined as the pearson-product-moment correlation between the vp discarding the first item, and the vp discarding the last (equation , xi (order vector i-th position), n (length of x), 𝜌𝑖 (resulting sc)). 𝜌𝑖 = ∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛 𝑗= 𝑛− ) ∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛− 𝑗= 𝑛− ) 𝑛− 𝑖= 𝑛 𝑖= √∑ (𝑥𝑖− ∑ 𝑥𝑖 𝑛 𝑗= 𝑛− ) 𝑛 𝑖= √∑ (𝑥𝑖 − ∑ 𝑥𝑖 𝑛− 𝑗= 𝑛− ) 𝑛− 𝑖= ( ) we then define the pqa as the sc of the vp after removing background noise, normalized for the sc of the percent grouping partitions (defined as the sorted vector in ascending order). this, the more similar vp is to its sorted vector, the higher the score is yielded (equation , 𝝆𝒙 (sc of the vp), 𝝆𝑹𝒂𝒏𝒅𝒙̅̅ ̅̅ ̅̅ ̅̅ ̅ (mean of the sc of one thousand randomizations), 𝝆𝑷𝒆𝒓𝒇𝒆𝒄𝒕𝒙 sc of the sorted vector in ascending order)). 𝑷𝑸𝑨𝒙 = 𝝆𝒙−𝝆𝑹𝒂𝒏𝒅𝒙̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ 𝝆𝑷𝒆𝒓𝒇𝒆𝒄𝒕𝒙 ( ) . . background-noise correlation factor in the pqa score .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / to compute the background-noise correlation factor in the pqa score definition, we sample the indexes of the vp and the swapping the corresponding items. this background correction is aimed to remove inherent noise in the data, even though the score may still be subjected to noise from the chosen clustering algorithm or discrepancies in the posterior classification. . . statistical significance of the pqa score to quantify the statistical significance of the pqa score, we calculate a z-score (equation ), 𝒛𝒙 = 𝑷𝑸𝑨𝒙−𝑷𝑸𝑨𝑹𝒂𝒏𝒅̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅ 𝑺𝑫𝑷𝑸𝑨𝑹𝒂𝒏𝒅 ( ) where 𝑃𝑄𝐴𝑥 is the pqa score of the vp, 𝑃𝑄𝐴𝑅𝑎𝑛𝑑̅̅ ̅̅ ̅̅ ̅̅ ̅̅ ̅̅ is the mean of pqa scores of one thousand randomizations of the vp. these randomizations have the purpose of generating a solid random background to compare it to the real signal. the number of randomizations does not depend on the size of the vp. it is worth to notice that there are two randomization processes, one is meant to generate the input population of random vectors to calculate the pqa score to further calculate a z- score and the other is representing the noise in equation . . . defining noise proportions to provide a quantification of the embedded noise in the vp, we calculate the z-scores from the distribution of pqa values of the randomized vectors. this shuffling is yielded by scrambling the vector. then this z-score is interpolated to retrieve the estimated noise in the vp cluster. . . effect of the length and number of partitions of the vector in the z-score distributions. since we want to compare the pqa with the noise, we randomized times the vp. we opted to describe the dynamic of the z-score given the different percentage of noise and the number of partitions. for this, we synthetically crafted vector of both ranging from to elements and number of classifications. the z-scores were retrieved from the crafted vectors using the formulas described above. . results and discussion . . effects of permuted numeric labels on the partition we wondered whether the correct assigning of numeric labels to alter the less possible the sc calculations, so we analyzed how the sc changes over the synthetic partitions with permuted labels. we began generating synthetic partitions in ascending and descending order, increasing both the number of classifications and the number of items, up to . it is important to highlight that the number of items belonging to each classification was kept constant. because trying all the possible permutations for each vector would be implausible, we created a subset of permutations of each vector, then we calculated the mean sc (figure , see methodology). we observed that the mean sc got high when the number of items in the vp was greater or equal to times the number of classifications, nevertheless, we got the highest sc when the numeric labels we assigned by sequential order, either ascending or descending (figure ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . z-scores of the pqa scores from partitions varying in the number of classifications and the length of the partition. . . length of partitions as a proxy of the number of classifications we wonder whether the number of classifications and the length of the vp may change the statistical significance of the pqa score because of the less the number of items in the vp, the greater the chance to group each item with any order. we then tested such effect by calculating a z-score from ordered synthetic partitions increasing both the number of classifications and the number of items up to . we also kept constant the number of classifications for the sake of this analysis. we noticed that only the length of the partition has a true effect on the z-score, but that is not the case for the number of classifications. we observed that every partition minor than could be considered as pure noise, however, we consider a z-score cutoff of greater than (p-value of . ). we also observed z-score values still greater than with a length of , , and , but lesser than with lengths between and (figure ). if we were more flexible, we could have laid out a length cutoff on those values without losing statistical significance, since a z-score of corresponds roughly to a p-value of . . the results of this analysis were expected by intuition because the probability of an item to occupy a position in the vp increases the number of items does the same. . . proof of concept: quantifying real noise after a literature revision, we noticed that some datasets were subject to visual inspection in their respective papers, so we applied our method to quantify the proportion of noise embedded in those datasets and to test whether they may lead to apophenia. we choose two datasets from literature because of two main reasons, first, the data should have a high number of items that are way above our z-score significance threshold (> ) and, second, we wanted contrasting orderings of the partitions so to have one dataset that looks very disordered and another that looks somewhat ordered to compare the noise proportions. lastly, we assessed the behavior of the metric in highly ordered data. this also matches our threshold mentioned above. . . . cancer methylation signatures the first dataset consists of methylation profiles of different cancerous and non-cancerous samples [ ] (figure ). though the classifications look very sparse and the groups are torn apart in many subgroups distributed along with the data’s vp. we detected . % of noise and a pqa score .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / of . (figure , with a z-score of . and a p-value of . x - ), both numbers imply that even though there may be disordered in the vp, there is not a very high noise proportion nor a high pqa score. these results suggest that, like any other statistical test, the longer the number of items in the partition the more diluted is the effect of disorder in the vp, and the results also lead to a greater statistical significance as shown in the analysis of the number of items and classifications. besides the authors concluded that their clustering analysis results made sense from their molecular and biological background, as well as the perspectives about the analyzed profiles, they only assessed grouping just by visual inspection and concluded the grouping was well done. however, understanding the noise in the cluster can help to pursue better markers since it could help to narrow the search space in these kinds of studies. (a) (b) figure . visual representation of clustered data used to assess the method. (a) dataset from jie shen et. al. (b) dataset from tooyoka et. al. . . . distribution of micrornas in cancer the second dataset consists of expression profiles of micrornas from three classes of samples: invasive breast cancer, those with ductal carcinoma in situ (dcis), and health (figure ) [ ]. the authors visually identified three clusters, though selecting the right cutting height threshold is difficult. besides, one of the clusters is a mix of classes in different proportions, leading the authors to arguably conclude that the dcis and control sample profiles are not different. on this matter, the pqa score and the proportion of noise are . and . %, respectively (figure , with z-score of . and a p-value of . x - ) providing a quantitative assay to support the grouping that the authors claimed. furthermore, in comparison with the methylation profiles discussed above, we can appreciate that a partition which appear even less fuzzy has even a higher noise ratio, supporting the idea of how visual inspection could lead to misleading results. (a) (b) figure . z-score distribution by percentage of randomized items. (a) dataset from jie shen et. al. (b) dataset from tooyoka et. al. the red dots represent the z-score interpolation of the corresponding data sets. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . . . comparison of genetic regulatory networks with theoretical models finally, to assess the pqa methodology using systems biology data we clustered networks according to their pairwise dissimilarity [ ]. first, curated biological networks were retrieved from abasy atlas (v . ) [ ]. for each biological network, we then constructed four networks each according to a theoretical model (barabasi-alberts, erdos-renyi, scale-free, and hierarchical- modular). we estimated the parameters of each theoretical model from the properties of the corresponding biological network. the models used reproduce one or more intrinsic characteristics of the biological networks, such as power-law distribution, hubs, and scale-free degrees, and hierarchical modular structure [ ]. visual inspection suggested that the classification yielded a highly ordered pv, distinguishing according to the nature of each network (figure ). the pqa score for this vp is . (p-value = . x - , z-score = . ) and the proportion of noise was . % (figure ). in contrast to the previous examples, here we obtained a highly ordered clustering and a very low proportion of noise, which suggests that although the models recapitulate some of the properties of genetic regulatory networks, each of them is not sufficient to capture their structural properties. figure . cluster analysis of distance among gene regulatory networks and theoretical network models. the abbreviations and colors used in the posterior classification are as follows: barabasi- alberts (ba, red), erdos-renyi (er, blue), scale-free (sf, green), hierarchical modularity (hm, purple), and biological networks (bi, orange). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . z-score distribution by percentage of randomized items of vp from genetic regulatory networks. the red dot represents the z-score interpolation of the actual data set. . conclusions in this work, we presented a novel method to quantify the proportion of noise embedded in the grouping of associated classes of the elements in hierarchical clustering. we proposed a relative score derived from an sc of the vp from the dendrogram of any clustering analysis and calculated z- statistics as well as an extrapolation to deliver an estimation of noise in the vp. we explain how the method is formulated and show the tests we made to systematically refine it. we additionally made a proof of concept by using clustering data from two works that we think perfectly represent overfitting by apophenia. additionally, we added an example from network biology where clustered networks are separated by intrinsic characteristics. although in this work we focused on examples where hierarchical clustering is performed, this framework can apply to any partition algorithm in which the elements are identified and a vector of the order can be acquired. we concluded that the clustered sets of biologic data have a high measure of noise, despite looking well grouped. we proved what a minimum number of classifications should be considered in this sort of clustering analysis to have a significant reduction of noise. on the other hand, we permuted the labels of the associated classes and concluded that the effect is negligible. we proved that randomness still plays an important role by biasing the results, though it may not be evident through visual inspection. the pqa could be used as a benchmark to test what clustering algorithm should be appropriate for the analyzed dataset by minimizing the noise proportion and to guide omics experimental designs. nevertheless, a word of caution, the pqa score alone can be subject to subjectivity if not used properly since it depended on the characteristics of the analyzed data. thus, the pqa score is thought to be considered as a quantification of noise in clustered data and should be used with discretion. author contributions: conceptualization, j.a.f.g.; methodology, j.a.f.g.; software, d.a.c.h., v.e.n.c., and j.a.f.g.; validation, d.a.c.h., v.e.n.c., and j.a.f.g.; formal analysis, d.a.c.h., v.e.n.c., and j.a.f.g.; investigation, d.a.c.h., v.e.n.c., j.r.l.b., and j.a.f.g.; resources, j.a.f.g.; data curation, d.a.c.h., v.e.n.c., and j.e.l.b.; writing—original draft preparation, d.a.c.h., v.e.n.c., j.e.l.b., and j.a.f.g.; writing—review and editing, d.a.c.h., v.e.n.c., and j.a.f.g.; visualization, d.a.c.h., v.e.n.c., j.e.l.b., and j.a.f.g.; supervision, j.a.f.g.; project administration, j.a.f.g.; funding acquisition, j.a.f.g. all authors have read and agreed to the published version of the manuscript. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / funding: this work was supported by the programa de apoyo a proyectos de investigación e innovación tecnológica (papiit-unam) [in to j.a.f.g.]. conflicts of interest: the authors declare no conflict of interest. references . kang, s., kim, b., park, s.-b., et al. . stage-specific methylome screen identifies that nefl is downregulated by promoter hypermethylation in breast cancer. international journal of oncology ( ), pp. – , doi: . /ijo. . . . kiselev, v. y., andrews, t. s., & hemberg, m. ( ). challenges in unsupervised clustering of single-cell rna-seq data. nature reviews genetics, ( ), - , doi: . /s - - - . . al-harbi, s.h. and rayward-smith, v.j. . adapting k-means for supervised clustering. applied intelligence ( ), pp. – , doi: . /s - - - . . hassani, m., & seidl, t. ( ). using internal evaluation measures to validate the quality of diverse stream clustering algorithms. vietnam journal of computer science, ( ), - , doi: . /s - - - . . fyfe, s., williams, c., mason, o.j. and pickup, g.j. . apophenia, theory of mind and schizotypy: perceiving meaning and intentionality in randomness. cortex ( ), pp. – , doi: . /j.cortex. . . . . getmansky, m., lo, a.w. and makarov, i. . an econometric model of serial correlation and illiquidity in hedge fund returns. journal of financial economics ( ), pp. – , doi: . /j.jfineco. . . . . shen, j., hu, q., schrauder, m., et al. . circulating mir- b and mir- a as biomarkers for breast cancer detection. oncotarget ( ), pp. – , doi: . /oncotarget. . . toyooka, s., toyooka, k. o., maruyama, r., virmani, a. k., girard, l., miyajima, k., ... & brambilla, e. ( ). dna methylation profiles of lung tumors. molecular cancer therapeutics, ( ), - . . schieber, t. a., carpi, l., díaz-guilera, a., pardalos, p. m., masoller, c., & ravetti, m. g. ( ). quantification of network structural dissimilarities. nature communications, ( ), - . . escorcia-rodríguez, j. m., tauch, a., & freyre-gonzález, j. a. ( ). abasy atlas v . : the most comprehensive and up-to-date inventory of meta-curated, historical, bacterial regulatory networks, their completeness and system-level characterization. computational and structural biotechnology journal, doi: . /j.csbj. . . . . barabasi, a. l., & oltvai, z. n. ( ). network biology: understanding the cell's functional organization. nature reviews genetics, ( ), - , doi: . /nrg . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ http://f .com/work/bibliography/ https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / mass spectrometry-based sequencing of the anti-flag-m antibody using multiple proteases and a dual fragmentation scheme mass spectrometry-based sequencing of the anti-flag-m antibody using multiple proteases and a dual fragmentation scheme authors: weiwei peng #, matti f. pronker #, joost snijder * #equal contribution *corresponding author: j.snijder@uu.nl affiliation: biomolecular mass spectrometry and proteomics, bijvoet center for biomolecular research and utrecht institute of pharmaceutical sciences, utrecht university, padualaan , ch utrecht, the netherlands keywords: mass spectrometry, antibody, de novo sequencing, ethcd, stepped hcd, herceptin, flag tag, anti-flag-m . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / abstract: antibody sequence information is crucial to understanding the structural basis for antigen binding and enables the use of antibodies as therapeutics and research tools. here we demonstrate a method for direct de novo sequencing of monoclonal igg from the purified antibody products. the method uses a panel of multiple complementary proteases to generate suitable peptides for de novo sequencing by lc-ms/ms in a bottom-up fashion. furthermore, we apply a dual fragmentation scheme, using both stepped high-energy collision dissociation (stepped hcd) and electron transfer high-energy collision dissociation (ethcd) on all peptide precursors. the method achieves full sequence coverage of the monoclonal antibody herceptin, with an accuracy of % in the variable regions. we applied the method to sequence the widely used anti-flag-m mouse monoclonal antibody, which we successfully validated by remodeling a high-resolution crystal structure of the fab and demonstrating binding to a flag-tagged target protein in western blot analysis. the method thus offers robust and reliable sequences of monoclonal antibodies. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction antibodies can bind a great molecular diversity of antigens, owing to the high degree of sequence diversity that is available through somatic recombination, hypermutation, and heavy-light chain pairings - . sequence information on antibodies therefore is crucial to understanding the structural basis of antigen binding, how somatic hypermutation governs affinity maturation, and an overall understanding of the adaptive immune response in health and disease, by mapping out the antibody repertoire. moreover, antibodies have become invaluable research tools in the life sciences and ever more widely developed as therapeutic agents - . in this context, sequence information is crucial for the use, production and validation of these important research tools and biopharmaceutical agents - . antibody sequences are typically obtained through cloning and sequencing of the coding mrnas of the paired heavy and light chains - . the sequencing workflows thereby rely on isolation of the antibody-producing cells from peripheral blood monocytes, or spleen and bone marrow tissues. these antibody-producing cells are not always readily available however, and cloning/sequencing of the paired heavy and light chains is a non-trivial task with a limited success rate - . moreover, antibodies are secreted in bodily fluids and mucus. antibodies are thereby in large part functionally disconnected from their producing b-cell, which raises questions on how the secreted antibody pool relates quantitatively to the underlying b-cell population and whether there are potential sampling biases in current antibody sequencing strategies. direct mass spectrometry (ms)-based sequencing of the secreted antibody products is a useful complementary tool that can address some of the challenges faced by conventional sequencing strategies relying on cloning/sequencing of the coding mrnas - . ms-based methods do not rely on the availability of the antibody-producing cells, but rather target the polypeptide products directly, offering the prospect of a next generation of serology, in which secreted antibody sequences might be obtained from any bodily fluid. whereas ms-based de novo sequencing still has a long way to go towards this goal, owing to limitations in sample requirements, sequencing accuracy, read length and sequence assembly, ms has been successfully used to profile the antibody repertoire and obtain (partial) antibody sequences beyond those available from conventional sequencing strategies based on cloning/sequencing of the coding mrnas - . most ms-based strategies for antibody sequencing rely on a proteomics-type bottom-up lc- ms/ms workflow, in which the antibody product is digested into smaller peptides for ms analysis , - . available germline antibody sequences are then often used either as a template to guide .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / assembly of de novo peptide reads (such as in peaks ab) , or used as a starting point to iteratively identify somatic mutations to arrive at the mature antibody sequence (such as in supernovo) . to maximize sequence coverage and aid read assembly, these ms-based workflows typically use a combination of complementary proteases and aspecific digestion to generate overlapping peptides. the most straightforward application of these ms-based sequencing workflows is the successful sequencing of monoclonal antibodies from (lost) hybridoma cell lines, but it also forms the basis of more advanced and challenging applications to characterize polyclonal antibody mixtures and profile the full antibody repertoire from serum. here we describe an efficient protocol for ms-based sequencing of monoclonal antibodies. the protocol requires approximately picomol of the antibody product and sample preparation can be completed within one working day. we selected a panel of proteases with complementary specificities, which are active in the same buffer conditions for parallel digestion of the antibodies. we developed a dual fragmentation strategy for ms/ms analysis of the resulting peptides to yield rich sequence information from the fragmentation spectra of the peptides. the protocol yields full and deep sequence coverage of the variable domains of both heavy and light chains as demonstrated on the monoclonal antibody herceptin. as a test case, we used our protocol to sequence the widely used anti-flag-m mouse monoclonal antibody, for which no sequence was publicly available despite its described use in + peer-reviewed publications - . the protocol achieved full sequence coverage of the variable domains of both heavy and light chains, including all complementarity determining regions (cdrs). the obtained sequence was successfully validated by remodeling the published crystal structure of the anti-flag-m fab and demonstrating binding of the synthetic recombinant antibody following the experimental sequence to a flag-tagged protein in western blot analysis. the protocol developed here thus offers robust and reliable sequencing of monoclonal antibodies with prospective applications for sequencing secreted antibodies from bodily fluids. results we used an in-solution digestion protocol, with sodium-deoxycholate as the denaturing agent, to generate peptides from the antibodies for lc-ms/ms analysis. following heat denaturation and disulfide bond reduction, we used iodoacetic acid as the alkylating agent to cap free cysteines. note that conventional alkylating agents like iodo-/chloroacetamide generate + da mass differences on cysteines and primary amines, which may lead to spurious assignments as glycine .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / residues in de novo sequencing. the + da mass differences generated by alkylation with iodoacetic acid circumvents this potential pitfall. we chose a panel of proteases with activity at ph . - . , so that the denatured, reduced and alkylated antibodies could be easily split for parallel digestion under the same buffer conditions. these proteases (with indicated cleavage specificities) included: trypsin (c-terminal of r/k), chymotrypsin (c-terminal of f/y/w/m/l), α-lytic protease (c-terminal of t/a/s/v), elastase (unspecific), thermolysin (unspecific), lysn (n-terminal of k), lysc (c-terminal of k), aspn (n- terminal of d/e), and gluc (c-terminal of d/e). correct placement or assembly of peptide reads is a common challenge in de novo sequencing, which can be facilitated by sufficient overlap between the peptide reads. this favors the occurrence of missed cleavages and longer reads, so we opted to perform a brief -hour digestion. following digestion, sdc is removed by precipitation and the peptide supernatant is desalted, ready for lc-ms/ms analysis. the resulting raw data was used for automated de novo sequencing with the supernovo software package. as peptide fragmentation is dependent on many factors like length, charge state, composition and sequence , we needed a versatile fragmentation strategy to accommodate the diversity of antibody-derived peptides generated by the proteases. we opted for a dual fragmentation scheme that applies both stepped high-energy collision dissociation (stepped hcd) and electron transfer high-energy collision dissociation (ethcd) on all peptide precursors - . the stepped hcd fragmentation includes three collision energies to cover multiple dissociation regimes and the ethcd fragmentation works especially well for higher charge states, also adding complementary c/z ions for maximum sequence coverage. we used the monoclonal antibody herceptin (also known as trastuzumab) as a benchmark to test our protocol - . from the total dataset of proteases, we collected peptide reads (defined as peptides with score >= , see methods for details), of which with superior stepped hcd fragmentation, and with superior ethcd fragmentation (see table s ). sequence coverage was % in both heavy and light chains across the variable and constant domains (see figures s and s ). the median depth of coverage was overall and slightly higher in the light chain (see table s and figure s - ). the median depth of coverage in the cdrs of both chains ranged from to . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . mass spectrometry-based de novo sequencing of the monoclonal antibody herceptin. the variable regions of the heavy (a) and light chains (b) are shown. the ms-based sequence is shown alongside the known herceptin sequence, with differences highlighted by asterisks (*). exemplary ms/ms spectra supporting the assigned sequences of the heavy and light chain cdrs are shown below the alignments. peptide sequence and fragment coverage are indicated on top of the spectra, with b/c ions indicated in blue and y/z ions in red. the same coloring is used to annotate peaks in the spectra, with additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in green. note that to prevent overlapping peak labels, only a subset of successfully matched peaks is annotated. the experimentally determined de novo sequence is shown alongside the known herceptin sequence for the variable domains of both chains in figure , with exemplary ms/ms spectra for the cdrs. we achieved an overall sequence accuracy of % with the automated sequencing procedure of supernovo, with incorrect assignments in the light chain. in framework of the light chain, i was incorrectly assigned as the isomer leucine (l), a common ms-based sequencing error. in cdrl of the light chain, an additional misassignment was made for the dipeptide h /y , which was incorrectly assigned as w /n . the dipeptides hy and wn have identical masses, and the misassignment of w /n (especially w ) was poorly .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supported by the fragmentation spectra, in contrast to the correct h /y assignment (see c /c in fragmentation spectra, figure ). overall, the protocol yielded highly accurate sequences at a combined / positions of the variable domains in herceptin. figure . mass spectrometry based de novo sequence of the mouse monoclonal anti-flag-m antibody. the variable regions of the heavy (a) and light chains (b) are shown. the ms-based sequence is shown alongside the previously published sequenced in the crystal structure of the fab (pdb id: g ), and germline sequence (imgt-domaingapalign; ighv - /ighj ; igkv - /igkj ). differential residues are highlighted by asterisks (*). exemplary ms/ms spectra in support of the assigned sequences are shown below the alignments. peptide sequence and fragment coverage are indicated on top of the spectra, with b/c ions indicated in blue, y/z ions in red. the same coloring is used to annotate peaks in the spectra, with additional peaks such as intact/charge reduced precursors, neutral losses and immonium ions indicated in green. note that to prevent overlapping peak labels, only a subset of successfully matched peaks is annotated. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / we next applied our sequencing protocol to the mouse monoclonal anti-flag-m antibody as a test case . despite the widespread use of anti-flag-m to detect and purify flag-tagged proteins , the only publicly available sequences can be found in the crystal structure of the fab . the modelled sequence of the original crystal structure had to be inferred from germline sequences that could match the experimental electron density and also includes many placeholder alanines at positions that could not be straightforwardly interpreted. the full anti- flag-m dataset from the proteases included peptide reads (with scores >= ); with superior stepped hcd fragmentation spectra, and with superior ethcd spectra. we achieved full sequence coverage of the variable regions of both heavy and light chains, with a median depth of coverage in the cdrs ranging from to (see table s ). as for herceptin, the depth of coverage was better in the light chain compared to the heavy chain (see figure s - s ). the full ms-based anti-flag-m sequences can be found in fasta format in the supplementary information. figure . validation of the ms-based anti-flag-m sequence. a) the previously published crystal structure of the anti-flag-m fab was remodeled with the experimentally determined sequence, shown in surface rendering with cdrs and differential residues highlighted in colors. b) fo-fc electron density of the new refined map contoured at rmsd is shown in blue and fo-fc positive difference density of the original deposited map contoured at . rmsd in green around the cdr loops of the heavy and light chains. differential residues between the published crystal structure and the model based on our antibody sequencing are indicated in purple. c) western blot validation of the synthetic recombinant anti-flag-m antibody produced with the experimentally determined sequence demonstrate equivalent flag-tag binding compared to commercial anti-flag-m (see also figure s ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / the ms-based sequences of anti-flag-m are shown alongside the crystal structure sequences and the inferred germline precursors with exemplary ms/ms spectra for the cdrs in figure . the experimentally determined sequence reveals that anti-flag-m is a mouse igg , with an ighv - /ighj heavy chain and igkv - /igkj kappa light chain. the experimentally determined sequence differs at and positions in the heavy and light chain of the fab crystal structure, respectively. to validate the experimentally determined sequences, we remodeled the crystal structure using the ms-based heavy and light chains, resulting in much improved model statistics (see figure and table s ). the experimental electron densities show excellent support of the ms-based sequence (as shown for the cdrs in figure b). a notable exception is l in cdrh of the heavy chain. the ms-based sequence was assigned as leucine, but the experimental electron density supports assignment of the isomer isoleucine instead (see figure s ). in contrast to the original model our new ms-based model reveals a predominantly positively charged paratope (see figure s ), which potentially complements the - net charge of the flag tag epitope (dykddddk) to mediate binding. the experimentally determined anti-flag-m sequence, with the l i correction, was further validated by testing binding of the synthetic recombinant antibody to a purified flag-tagged protein in western blot analysis (see figure c and s ). the synthetic recombinant antibody showed equivalent binding compared to the original antibody sample used for sequencing, confirming that the experimentally determined sequence is reliable to obtain the recombinant antibody product with the desired functional profile. discussion there are four other monoclonal antibody sequences against the flag tag publicly available through the abcd (antibodies chemically defined) database - . comparison of the cdrs of anti-flag-m with these additional four monoclonal antibodies reveals a few common motifs that may determine flag-tag binding specificity (see table s ). in the heavy chain, the only common motif between all five monoclonals is that the first three residues of cdrh follow a gxs sequence. in addition, the last three residues of cdrh of anti-flag-m are ydy, similar to mdy in h , and ydf in eeh . (and eeh . also ends cdrh with an aromatic f residue). in contrast to the heavy chain, the cdrs of the light chain are almost completely conserved in / monoclonals with only minimal differences compared to germline. the anti-flag-m and h monoclonals were specifically raised in mice against the flag-tag epitope , , whereas the computationally designed eeh . and eeh . monoclonals contain the same light chain from an ee-dipeptide tag directed antibody . this suggests that the igkv - /igkj light chain may .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / be a common determinant of binding to a small negatively charged peptide epitope like the flag- tag and is readily available as a hardcoded germline sequence in the mouse antibody repertoire. the availability of the anti-flag-m sequences may contribute to the wider use of this important research tool, as well as the development and engineering of better flag-tag directed antibodies. this example illustrates that our ms-based sequencing protocol yields robust and reliable monoclonal antibody sequences. the protocol described here also formed the basis of a recent application where we sequenced an antibody directly from patient-derived serum, using a combination with top-down fragmentation of the isolated fab fragment . the dual fragmentation strategy yields high-quality spectra suitable for de novo sequencing and may further contribute to the exciting prospect of a new era of serology in which antibody sequences can be directly obtained from bodily fluids. methods sample preparation anti-flag m antibody was purchased from sigma (catalogue number f ). herceptin was provided by roche (penzberg, germany). μg of each sample was denatured in % sodium deoxycholate (sdc), mm tris-hcl, mm tris( -carboxyethyl)phosphine (tcep), ph . at °c for min, followed with min incubation at °c for reduction. sample was then alkylated by adding iodoacetic acid to a final concentration of mm and incubated in the dark at room temperature for min. μg sample was then digested by one of the following proteases: trypsin, chymotrypsin, lysn, lysc, gluc, aspn, alp, thermolysin and elastase in a : ratio (w:w) in a total volume of ul of mm ammonium bicarbonate at °c for h. after digestion, sdc was removed by adding ul formic acid (fa) and centrifugation at g for min. following centrifugation, the supernatant containing the peptides was collected for desalting on a µm oasis hlb -well plate (waters). the oasis hlb sorbent was activated with % acetonitrile and subsequently equilibrated with % formic acid in water. next, peptides were bound to the sorbent, washed twice with % formic acid in water and eluted with µl of % acetonitrile/ % formic acid in water (v/v). the eluted peptides were vacuum-dried and reconstituted in µl % fa. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / mass spectrometry the digested peptides (single injection of . ug) were separated by online reversed phase chromatography on an agilent uhplc (column packed with poroshell ec c ; dimensions cm x µm, . µm, agilent technologies) coupled to a thermo scientific orbitrap fusion mass spectrometer. samples were eluted over a min gradient from % to % acetonitrile at a flow rate of . μl/min. peptides were analyzed with a resolution setting of in ms . ms scans were obtained with standard agc target, maximum injection time of ms, and scan range - . the precursors were selected with a m/z window and fragmented by stepped hcd as well as ethcd. the stepped hcd fragmentation included steps of %, % and % nce. ethcd fragmentation was performed with calibrated charge-dependent etd parameters and % nce supplemental activation. for both fragmentation types, ms scan were acquired at resolution, % normalized agc target, ms maximum injection time, scan range - . ms data analysis automated de novo sequencing was performed with supernovo (version . , protein metrics inc.). custom parameters were used as follows: non-specific digestion; precursor and product mass tolerance was set to ppm and . da respectively; carboxymethylation (+ . ) on cysteine was set as fixed modification; oxidation on methionine and tryptophan was set as variable common modification; carboxymethylation on the n-terminus, pyroglutamic acid conversion of glutamine and glutamic acid on the n-terminus, deamidation on asparagine/glutamine were set as variable rare modifications. peptides were filtered for score >= for the final evaluation of spectrum quality and (depth of) coverage. supernovo generates peptide groups for redundant ms/ms spectra, including also when stepped hcd and ethcd fragmentation on the same precursor both generate good peptide-spectrum matches. in these cases only the best-matched spectrum is counted as representative for that group. this criterium was used in counting the number of peptide reads reported in table s . germline sequences and cdr boundaries were inferred using imgt/domaingapalign - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / revision of the anti-flag-m fab crystal structure model as a starting point for model building, the reflection file and coordinates of the published anti- flag-m fab crystal structure were used (pdb id: g ) . care was taken to use the original rfree labels of the deposited reflection file for refinement, so as not to introduce extra model bias. differential residues between this structure and our mass spectrometry-derived anti-flag sequence were manually mutated and fitted in the density using coot . many spurious water molecules that caused severe steric clashes in the original model were also manually removed in coot. densities for two sulfate and one chloride ion were identified and built into the model. the original crystallization solution contained . m ammonium sulfate. iterative cycles of model geometry optimization in real space in coot and reciprocal space refinement by phenix were used to generate the final model, which was validated with molprobity - . cloning and expression of synthetic recombinant anti-flag-m to recombinantly express full-length anti-flag-m , the proteomic sequences of both the light and heavy chains were reverse-translated and codon optimized for expression in human cells using the integrated dna technologies (idt) web tool (http://www.idtdna.com/codonopt) . for the linker and fc region of the heavy chain, the standard mouse ig gamma- (ighg ) amino acid sequence (uniprot p . ) was used. an n-terminal secretion signal peptide derived from human igg light chain (meapaqllfllllwlpdttg) was added to the n-termini of both heavy and light chains. bamhi and noti restriction sites were added to the ’ and ’ ends of the coding regions, respectively. only for the light chain, a double stop codon was introduced at the ’ site before the noti restriction site. the coding regions were subcloned using bamhi and noti restriction-ligation into a prk expression vector with a c-terminal octahistidine tag between the noti site and a double stop codon ’ of the insert, so that only the heavy chain has a c-terminal aaahhhhhhhh sequence for nickel-affinity purification (the triple alanine resulting from the noti site). the l i correction in the heavy chain was introduced later (after observing it in the crystal structure) by iva cloning . expression plasmids for the heavy and light chain were mixed in a : (w/w) ratio for transient transfection in hek cells with polyethylenimine, following standard procedures. medium was collected days after transfection and cells were spun down by minutes of centrifugation at g. antibody was directly purified from the supernatant using ni-sepharose excel resin (cytiva lifes sciences), washing with mm nacl, mm cacl , .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / mm imidazole, mm hepes ph . and eluting with mm nacl, mm cacl , mm imidazole, mm hepes ph . . western blot validation of anti-flag-m binding to test binding of our recombinant anti-flag-m to the flag-tag epitope, compared to the commercially available anti-flag-m (sigma), we used both antibodies to probe western blots of a flag-tagged protein in parallel. purified rabies virus glycoprotein ectodomain (sad b strain, uniprot residues - ) with or without a c-terminal flag-tag followed by a foldon trimerization domain and an octahistidine tag was heated to °c in xt sample buffer (biorad) for minutes. samples were run twice on a criterion xt - % polyacrylamide gel (biorad) in mes xt buffer (biorad) before western blot transfer to a nitrocellulose membrane in tris-glycine buffer (biorad) with % methanol. the membrane was blocked with % (w/v) dry non-fat milk in phosphate-buffered saline (pbs) overnight at °c. the membrane was cut in two (one half for the commercial and one half for the recombinant anti-flag-m ) and each half was probed with either commercial (sigma) or recombinant anti-flag-m at µg/ml in pbs for minutes. after washing three times with pbst (pbs with . % v/v tween ), polyclonal goat anti-mouse fused to horseradish peroxidase (hrp) was used to detect binding of anti-flag-m to the flag-tagged protein for both membranes. the membranes were washed three more times with pbst before applying enhanced chemiluminescence (ecl; pierce) reagent to image the blots in parallel. data availability the raw lc-ms/ms data have been deposited to the proteomexchange consortium via the pride partner repository with the dataset identifier pxd . the coordinates and reflection file with phases for the remodeled crystal structure of the anti-flag-m fab have been deposited in the protein data bank under accession code bg . acknowledgements herceptin was a kind gift from roche (penzberg, germany). we would like to acknowledge support by protein metrics inc. through access to supernovo software and helpful discussion on de novo antibody sequencing. we would like to thank everyone in the biomolecular mass .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / spectrometry and proteomics group at utrecht university for support and helpful discussions. this research was funded by the dutch research council nwo gravitation boo, institute for chemical immunology (ici; . . ). author contributions wp and js conceived of the project. wp carried out the ms experiments. wp and js analyzed the ms data. mfp remodeled the crystal structure. mfp cloned and produced the synthetic recombinant antibody and carried out western blotting. js supervised the project. js wrote the first draft and all authors contributed to preparing the final version of the manuscript. competing interests the authors declare no competing interests references . tonegawa, s., somatic generation of antibody diversity. nature , ( ), - . . watson, c. t.; glanville, j.; marasco, w. a., the individual and population genetics of antibody immunity. trends in immunology , ( ), - . . carter, p. j.; lazar, g. a., next generation antibody drugs: pursuit of the'high-hanging fruit'. nature reviews drug discovery , ( ), . . grilo, a. l.; mantalaris, a., the increasingly human and profitable monoclonal antibody market. trends in biotechnology , ( ), - . . baker, m., blame it on the antibodies. nature , ( ), . . uhlen, m.; bandrowski, a.; carr, s.; edwards, a.; ellenberg, j.; lundberg, e.; rimm, d. l.; rodriguez, h.; hiltke, t.; snyder, m., a proposal for validation of antibodies. nature methods , ( ), - . . fischer, n. in sequencing antibody repertoires: the next generation, mabs, taylor & francis: ; pp - . . georgiou, g.; ippolito, g. c.; beausang, j.; busse, c. e.; wardemann, h.; quake, s. r., the promise and challenge of high-throughput sequencing of the antibody repertoire. nature biotechnology , ( ), - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . robinson, w. h., sequencing the functional antibody repertoire—diagnostic and therapeutic discovery. nature reviews rheumatology , ( ), . . boutz, d. r.; horton, a. p.; wine, y.; lavinder, j. j.; georgiou, g.; marcotte, e. m., proteomic identification of monoclonal antibodies from serum. analytical chemistry , ( ), - . . castellana, n. e.; mccutcheon, k.; pham, v. c.; harden, k.; nguyen, a.; young, j.; adams, c.; schroeder, k.; arnott, d.; bafna, v., resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin- α antibody. proteomics , ( ), - . . chen, j.; zheng, q.; hammers, c. m.; ellebrecht, c. t.; mukherjee, e. m.; tang, h.-y.; lin, c.; yuan, h.; pan, m.; langenhan, j., proteomic analysis of pemphigus autoantibodies indicates a larger, more diverse, and more dynamic repertoire than determined by b cell genetics. cell reports , ( ), - . . cheung, w. c.; beausoleil, s. a.; zhang, x.; sato, s.; schieferl, s. m.; wieler, j. s.; beaudet, j. g.; ramenani, r. k.; popova, l.; comb, m. j., a proteomics approach for the identification and cloning of monoclonal antibodies from serum. nature biotechnology , ( ), - . . guthals, a.; gan, y.; murray, l.; chen, y.; stinson, j.; nakamura, g.; lill, j. r.; sandoval, w.; bandeira, n., de novo ms/ms sequencing of native human antibodies. journal of proteome research , ( ), - . . lee, j.; boutz, d. r.; chromikova, v.; joyce, m. g.; vollmers, c.; leung, k.; horton, a. p.; dekosky, b. j.; lee, c.-h.; lavinder, j. j., molecular-level analysis of the serum antibody repertoire in young adults before and after seasonal influenza vaccination. nature medicine , ( ), - . . lee, j.; paparoditis, p.; horton, a. p.; frühwirth, a.; mcdaniel, j. r.; jung, j.; boutz, d. r.; hussein, d. a.; tanno, y.; pappas, l., persistent antibody clonotypes dominate the serum response to influenza over multiple years and repeated vaccinations. cell host & microbe , ( ), - . e . . lindesmith, l. c.; mcdaniel, j. r.; changela, a.; verardi, r.; kerr, s. a.; costantini, v.; brewer-jensen, p. d.; mallory, m. l.; voss, w. n.; boutz, d. r., sera antibody repertoire analyses reveal mechanisms of broad and pandemic strain neutralizing responses after human norovirus vaccination. immunity , ( ), - . e . . bandeira, n.; pham, v.; pevzner, p.; arnott, d.; lill, j. r., automated de novo protein sequencing of monoclonal antibodies. nature biotechnology , ( ), - . . rickert, k. w.; grinberg, l.; woods, r. m.; wilson, s.; bowen, m. a.; baca, m. in combining phage display with de novo protein sequencing for reverse engineering of monoclonal antibodies, mabs, taylor & francis: ; pp - . . savidor, a.; barzilay, r.; elinger, d.; yarden, y.; lindzen, m.; gabashvili, a.; tal, o. a.; levin, y., database-independent protein sequencing (dips) enables full-length de novo protein and antibody sequence determination. molecular & cellular proteomics , ( ), - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . sen, k. i.; tang, w. h.; nayak, s.; kil, y. j.; bern, m.; ozoglu, b.; ueberheide, b.; davis, d.; becker, c., automated antibody de novo sequencing and its utility in biopharmaceutical discovery. journal of the american society for mass spectrometry , ( ), - . . sousa, e.; olland, s.; shih, h. h.; marquette, k.; martone, r.; lu, z.; paulsen, j.; gill, d.; he, t., primary sequence determination of a monoclonal antibody against α-synuclein using a novel mass spectrometry-based approach. international journal of mass spectrometry , , - . . tran, n. h.; rahman, m. z.; he, l.; xin, l.; shan, b.; li, m., complete de novo assembly of monoclonal antibody sequences. scientific reports , ( ), - . . brizzard, b. l.; chubet, r. g.; vizard, d., immunoaffinity purification of flag epitope- tagged bacterial alkaline phosphatase using a novel monoclonal antibody and peptide elution. biotechniques , ( ), - . . sigma-aldrich anti-flag-m f product page. https://www.sigmaaldrich.com/catalog/product/sigma/f ?lang=en®ion=nl (accessed - - ). . paizs, b.; suhai, s., fragmentation pathways of protonated peptides. mass spectrometry reviews , ( ), - . . diedrich, j. k.; pinto, a. f.; yates iii, j. r., energy dependence of hcd on peptide fragmentation: stepped collisional energy finds the sweet spot. journal of the american society for mass spectrometry , ( ), - . . frese, c. k.; altelaar, a. m.; van den toorn, h.; nolting, d.; griep-raming, j.; heck, a. j.; mohammed, s., toward full peptide sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. analytical chemistry , ( ), - . . frese, c. k.; zhou, h.; taus, t.; altelaar, a. m.; mechtler, k.; heck, a. j.; mohammed, s., unambiguous phosphosite localization using electron-transfer/higher-energy collision dissociation (ethcd). journal of proteome research , ( ), - . . carter, p.; presta, l.; gorman, c. m.; ridgway, j.; henner, d.; wong, w.; rowland, a. m.; kotts, c.; carver, m. e.; shepard, h. m., humanization of an anti-p her antibody for human cancer therapy. proceedings of the national academy of sciences , ( ), - . . slamon, d. j.; leyland-jones, b.; shak, s.; fuchs, h.; paton, v.; bajamonde, a.; fleming, t.; eiermann, w.; wolter, j.; pegram, m., use of chemotherapy plus a monoclonal antibody against her for metastatic breast cancer that overexpresses her . new england journal of medicine , ( ), - . . einhauer, a.; jungbauer, a., the flag™ peptide, a versatile fusion tag for the purification of recombinant proteins. journal of biochemical and biophysical methods , ( - ), - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . roosild, t. p.; castronovo, s.; choe, s., structure of anti-flag m fab domain and its use in the stabilization of engineered membrane proteins. acta crystallographica section f: structural biology and crystallization communications , ( ), - . . entzminger, k. c.; hyun, j.-m.; pantazes, r. j.; patterson-orazem, a. c.; qerqez, a. n.; frye, z. p.; hughes, r. a.; ellington, a. d.; lieberman, r. l.; maranas, c. d., de novo design of antibody complementarity determining regions binding a flag tetra-peptide. scientific reports , ( ), - . . ikeda, k.; koga, t.; sasaki, f.; ueno, a.; saeki, k.; okuno, t.; yokomizo, t., generation and characterization of a human-mouse chimeric high-affinity antibody that detects the dykddddk flag peptide. biochemical and biophysical research communications , ( ), - . . lima, w. c.; gasteiger, e.; marcatili, p.; duek, p.; bairoch, a.; cosson, p., the abcd database: a repository for chemically defined antibodies. nucleic acids research , (d ), d -d . . bondt, a.; hoek, m.; tamara, s.; de graaf, b.; peng, w.; schulte, d.; den boer, m. a.; greisch, j.-f.; varkila, m. r.; snijder, j., human plasma igg repertoires are simple, unique, and dynamic. ssrn . . ehrenmann, f.; kaas, q.; lefranc, m.-p., imgt/ dstructure-db and imgt/domaingapalign: a database and a tool for immunoglobulins or antibodies, t cell receptors, mhc, igsf and mhcsf. nucleic acids research , (suppl_ ), d -d . . ehrenmann, f.; lefranc, m.-p., imgt/domaingapalign: imgt standardized analysis of amino acid sequences of variable, constant, and groove domains (ig, tr, mh, igsf, mhsf). cold spring harbor protocols , ( ), pdb. prot . . emsley, p.; cowtan, k., coot: model-building tools for molecular graphics. acta crystallographica section d: biological crystallography , ( ), - . . afonine, p. v.; grosse-kunstleve, r. w.; echols, n.; headd, j. j.; moriarty, n. w.; mustyakimov, m.; terwilliger, t. c.; urzhumtsev, a.; zwart, p. h.; adams, p. d., towards automated crystallographic structure refinement with phenix. refine. acta crystallographica section d: biological crystallography , ( ), - . . chen, v. b.; arendall, w. b.; headd, j. j.; keedy, d. a.; immormino, r. m.; kapral, g. j.; murray, l. w.; richardson, j. s.; richardson, d. c., molprobity: all-atom structure validation for macromolecular crystallography. acta crystallographica section d: biological crystallography , ( ), - . . fuglsang, a., codon optimizer: a freeware tool for codon optimization. protein expression and purification , ( ), - . . garcía-nafría, j.; watson, j. f.; greger, i. h., iva cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. scientific reports , , . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / comprehensive multi-omics study of the molecular perturbations induced by simulated diabetes on coronary artery endothelial cells aldo moreno-ulloa , *, hilda carolina delgado-de la herrán , , carolina Álvarez-delgado , omar mendoza-porras , rommel a. carballo-castañeda and francisco villarreal , ms laboratory, biomedical innovation department, center for scientific research and higher education of ensenada (cicese), baja california, méxico specialized laboratory in metabolomics and proteomics (metpro), cicese, méxico mitochondrial biology laboratory, biomedical innovation department, center for scientific research and higher education of ensenada (cicese), baja california, méxico csiro livestock and aquaculture, queensland bioscience precinct, carmody rd, st lucia, qld, australia school of medicine, university of california, san diego, ca, usa san diego va healthcare system * to whom correspondence should be addressed: biomedical innovation department, cicese carretera ensenada-tijuana no. , zona playitas, cp. , ensenada, b.c. mexico, phone: + ( ) - - ext. , e-mail: amoreno@cicese.mx .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract coronary artery endothelial cells (caec) exert an important role in the development of cardiovascular disease. dysfunction of caec is associated with cardiovascular disease in subjects with type diabetes mellitus (t dm). however, comprehensive studies of the effects that a diabetic environment exerts on this cellular type scarce. the present study characterized the molecular perturbations occurring on cultured bovine caec subjected to a prolonged diabetic environment (high glucose [hg] and high insulin [hi]). changes at the metabolite and peptide level were assessed by untargeted metabolomics and chemoinformatics, and the results were integrated with proteomics data using published swath-based proteomics on the same in vitro model. our findings were consistent with reports on other endothelial cell types, but also identified novel signatures of dna/rna, aminoacid, peptide, and lipid metabolism in cells under a diabetic environment. manual data inspection revealed disturbances on tryptophan catabolism and biosynthesis of phenylalanine-based, glutathione-based, and proline-based peptide metabolites. fluorescence microscopy detected an increase in binucleation in cells under treatment that also occurred when human caec were used. this multi-omics study identified particular molecular perturbations in an induced diabetic environment that could help unravel the mechanisms underlying the development of cardiovascular disease in subjects with t dm. keywords: swath-proteomics; metabolomics; type diabetes mellitus; endothelial cells; feature-based molecular networking .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . introduction damage to coronary artery endothelial cells (caec) leads to coronary endothelial dysfunction, which is associated with the development of cardiac pathologies in subjects with and without coronary atherosclerosis ( ). subjects with type diabetes mellitus (t dm) are particularly at increased risk of myocardial infarction ( ) and coronary endothelial dysfunction has been implicated in the prognosis ( ). a high-glucose (hg) environment —hallmark of t dm— leads to nitric oxide signaling, cell cycle ( ), apoptosis ( ), angiogenesis ( ), and dna structure impairment ( ). however, given the intrinsic heterogeneity of the endothelium, the molecular perturbations caused by hg vary accordingly with the type of studied endothelial cells ( , ). for instance, human microvascular endothelial cells showed increased gene expression of endothelial nitric oxide synthase, superoxide dismutase , glutathione peroxidase , thioredoxin reductase and compared to the regulation observed in human umbilical vein endothelial cells (huvec) when cultured in hg for h. furthermore, the response of endothelial cells to hg is influenced by the duration of exposure ( , ) as demonstrated in bovine aortic and human microvascular endothelial cells where cell proliferation and apoptosis were higher at < h compared to weeks of exposure ( ). in another example of time-dependent response, increased apoptosis (derived from dna fragmentation) and tumor necrosis factor alpha protein levels were reported in human coronary artery endothelial cells (hcaec) after only h of incubation with hg ( ). hence, the molecular response to hg cannot be generalized among endothelial cell types. previously we reported impaired mitochondrial function/structure and nitric oxide signaling in hg treated hcaec for h ( ). however, a h study documented an increased in pro-inflammatory cytokines ( ) and oxidative stress in hcaec ( ). the long-term (> h) effect of hg in caec has not been as extensively documented compared to other endothelial cell types. characterizing the effect of hg on caec may allow us to identify key signaling pathways (or specific biomolecules) associated with the development of endothelial dysfunction and cardiac pathologies. here, liquid chromatography coupled to mass spectrometry (lc-ms )-based untargeted metabolomics and swath-based quantitative proteomics data, as well as bio- and chemo- informatics were used to characterize the molecular perturbations occurring in bovine coronary artery endothelial cells (bcaec) under a prolonged diabetic environment. . methods . chemical and reagents recombinant human insulin was purchased from sigma aldrich (st. louis, mo, usa). antibiotic- antimitotic solution, trypsin-edta solution . %, hank’s balanced salt solution (hbss) without .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / phenol red, dulbecco’s modified eagle’s media (dmem) with glutamine, fetal bovine serum (fbs), hoechst , pentahydrate (bis-benzimide)-fluoropure™, and methanol-free formaldehyde ( % solution) were obtained from thermo fisher scientific (waltham, ma, usa). methanol, acetonitrile, and water were optima™ lc-ms grade and obtained from fisher scientific (hampton, nh, usa). ethanol lichrosolv® grade was obtained from merck kgaa (darmstadt, germany). rabbit anti-von willebrand factor (vwf) antibody and goat anti-rabbit igg conjugated to alexa fluor were obtained from abcam (cambridge, ma, usa). . cell culture bcaec were purchased from cell applications, inc. (san diego, ca, usa) and grown as previously described ( ). in brief, cells were grown with dmem ( . mmol/l glucose, supplemented with % fbs and % antibiotic-antimitotic solution) at oc in an incubator with a humidified atmosphere of % co . before experiments, cells were switched to dmem with % fbs for h to maintain the cells under a quiescent state. the model to simulate diabetes is described in ( ) (figure ). endothelial cells were cultured for days to determine the chronic molecular perturbations caused by simulated diabetes and to avoid the early (within h) cell proliferation effects caused by hg ( , ). in brief, cells were first treated with nmol/l insulin (high-insulin, hi) in normal glucose (ng, . mmol/l in dmem) for days ( ) and then maintained in high-glucose (hg, mmol/l in dmem) and constant hi for days. this sequential scheme tried to mimic the pathophysiological conditions that occur in t dm patients, wherein hyperinsulinemia precedes hyperglycemia ( ). cells were used at passages between to . the control group did not receive hi nor hg treatment. for selected experiments (binucleation analysis), hcaec ( years old caucasian male, history of t dm for > years) were purchased from cell applications, inc. and subjected to the same conditions as bcaec but using mesoendo growth medium (cell applications, inc.) to induce proliferation. for simulated diabetes, hcaec were treated with hi and hg as with bcaec but, mesoendo growht medium was used instead. for consistency, the group that underwent simulated diabetes (hg + hi) will be referred to as the “experimental group”. all experiments were carried out in triplicate. . immunofluorescence as previously described ( ), , cells per well were seeded onto -well plates (corning® cellbind®) and exposed to simulated diabetes. thereafter, bcaec and hcaec were washed with pbs to remove dead cells and debris. cells were fixed, permeabilized, and blocked as described before ( ). cells were then incubated with a polyclonal antibody against the vwf .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ( : , % bsa in pbs) overnight at oc and thereafter washed x with pbs. alexa fluor - labeled anti-rabbit ( : in pbs) was then used as a secondary antibody for h at rt and washed x with pbs. as a negative control, cells were incubated only with secondary antibody to assess for non-specific binding. cell nuclei were stained with hoechst ( µg/ml in hbss) for min and washed x with pbs. fluorescent images were taken in at least three random fields per condition using an evos® floid® cell imaging station with a fixed x air objective. image analysis was performed through imagej software (version . . ). . metabolite extraction cells were seeded at , cells per well in -well plates (corning® cellbind®) and treated as above. after hg and hi conditions, metabolites were extracted following a published protocol for adherent cells with some modifications ( ) (figure ). in brief, after washing the cells x with pbs, µl of a cold mixture of methanol: ethanol ( : , v:v) were added to each well, covered with aluminum foil, and incubated at - c for h. cells were then scrapped using a lifter (fisher scientific, hampton, nh, usa), and the supernatant was transferred to eppendorf tubes before centrifugation for min at , rpm at c. the supernatant was transferred to another tube and dried down by speedvac™ system (thermo fisher scientific, waltham, ma, usa). samples were reconstituted in water/acetonitrile : v/v with . % formic, centrifuged at , rpm for min at o c. the particle free supernatant was recovered for further lc-ms analysis. . lc-ms data acquisition for metabolomics metabolites were loaded into an eksigent nanolcâ system (ab sciex, foster city, ca, usa) with a halo phenyl-hexyl column ( . x mm, . µm, Å pore size, eksigent ab sciex, foster city, ca, usa) for data acquisition using the lc-ms parameters previously described with some modifications ( ). in brief, the separation of metabolites was performed using gradient elution with . % formic acid in water (a) and . % formic acid in acn (b) as mobile phases at a constant flow rate of µl/min. the gradient started with % b for min followed by a stepped increase to %, b over min and held constant for min. solvent composition was returned to % b for . min. column re-equilibration was carried out with % mobile phase b for minutes. potential carryover was minimized with a blank run ( µl buffer a) between sample experimental samples. the eluate from the lc was delivered directly to the turbov source of a tripletof + mass spectrometer (ab sciex, foster city, ca, usa) using electrospray ionization (esi) under positive mode. esi source conditions were set as follows: ionspray voltage floating, v; source temperature, °c; curtain gas, psi; ion source gases and were set to and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / psi; declustering potential, v. data was acquired using information-dependent acquisition (ida) with high sensitivity mode selected, automatically switching between full-scan ms and ms/ms. the accumulation time for tof ms was . s/spectra over the m/z range - da and for ms/ms scan was . s/spectra over the m/z - da. the ida settings were as follows charge state + to + , intensity cps, exclude isotopes within da, mass tolerance mda, and a maximum number of candidate ions . under ida settings, the ‘‘exclude former target ions’’ was set as s after two occurrences and ‘‘dynamic background subtract’’ was selected. manufacturer rolling collision energy (ce) option was used based on the size and charge of the precursor ion using formula ce=m/z x . + . the instrument was automatically calibrated by the batch mode using appropriate positive tof ms and ms/ms calibration solutions before sample injection and after injection of two samples (< . working hours) to ensure a mass accuracy of < ppm for both ms and ms/ms data. instrument performance was monitored during data acquisition by including qc samples (pooled samples of equal volume) every experimental samples. data acquisition of experimental samples was also randomized. . metabolomics data processing mass detection, chromatogram building and deconvolution, isotopic assignment, feature alignment, and gap-filling (to detect features missed during the initial alignment) from lc-ms datasets was performed using xcms (https://xcmsonline.scripps.edu) ( ) and mzmine ( ) software. the xcms pipeline was used for normalization of feature area and statistical analysis. to identify or annotate the metabolites at the chemical structure and class level, the ms - containing features extracted with mzmine were further analyzed using the global natural products social molecular networking (gnps) ( ), network annotation propagation (nap) ( ) and ms lda ( ) in silico annotation tools, and classyfire automated chemical classification ( ), as previously described ( ) with some modifications. the confidences of such annotations are level (probable structure by library spectrum match) and level (tentative candidates) in agreement with the metabolomics standards initiative (msi) classification ( ). molecular networking, nap, and classyfire outputs were integrated using the molnetenhancer workflow ( ). molecular networks were visualized using cytoscape version . . ( ). in addition, chemical substructures (co-occurring fragments and neutral losses referred to as “mass motifs” [m m]) were recognized using the ms lda web pipeline (http://www.ms lda.org) to further annotate metabolites (level , msi). the detailed processing parameters for xcms and mzmine pipelines are found in the supporting information. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . peptidomics data processing for peptide identification, raw .wiff and .wiff.scan files (same files used for mzmine and xcms) from the experimental and control groups were analyzed separately using proteinpilot software version . (ab sciex, foster city, ca, usa) with the paragon algorithm. ms and ms data were searched against the bos taurus swissprot sequence database ( reviewed proteins+common protein contaminants, february release). the parameters input was: sample type, identification; digestion, none; cys alkylation, none; instrument, tripletof ; special factors, none; species, bos taurus; id focus, biological modifications, and amino acid substitutions; search effort, thorough id. false discovery rate analysis was also performed. all peptides were exported and those with a > % confidence were linked to the corresponding feature extracted by the xcms algorithm using their accurate mass and retention time information. for peptide quantification, we employed the normalized feature abundances (ms level) generated by xcms. a significance threshold of p< . (welch’s t test) was utilized. . proteomics data reprocessing the swath-based proteomics data (identifier pxd ), hosted in proteomexchange consortium via pride ( ), was reanalyzed with some modifications. the parameters used to build the spectral library remained the same ( ), while the parameter for peptides per protein was set to in the software swath® acquisition microapp . in peakview® version . (ab sciex, foster city, ca, usa). the obtained protein peak areas were exported to markerview™ version . (ab sciex, foster city, ca, usa) for further data refinement, including assignment of ids to files and removal of reversed and common contaminants. peak areas were exported in a .tsv file, and normalized with normalyzerde online version . . ( ). the normalyzerde pipeline comprises different normalization methods (log , variance stabilizing normalization, total intensity, median, mean, quantile, cycloess, and robust linear regression). the results of qualitative (ma plots, scatter plots, box plots, density plots) and quantitative (pooled intragroup coefficient of variation [pcv], median absolute deviation [pmad], estimate of variance [pev]) parameters were compared between the normalization methods to select the most appropriate. . bioinformatic analysis of proteomics data proteins that passed the significance threshold were first converted to their corresponding entrez gene (geneid) using https://www.uniprot.org/uploadlists/ and then transformed to their human equivalents using the ortholog conversion feature in https://biodbnet- abcc.ncifcrf.gov/db/dbortho.php. bioinformatic analysis was done on omicsnet website platform .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (https://www.omicsnet.ca/) ( , ). first, a protein-protein interaction (ppi) molecular network (first-order network containing query or seeds molecules and their immediate interacting partners) using string ppi database was built ( ) and then pathway enrichment analysis was performed using the built-in reactome and the kyoto encyclopedia of genes and genomes (kegg) databases. to visualize modules (functional units) contained in the molecular network the walktrap algorithm (within omicsnet platform) was employed. hypergeometric test was used to compute p-values. . integrative analysis of proteomics and metabolomics data the molecular interactions between the proteins and metabolites differentially abundant between hg + hi and ng were determined in omicsnet ( , ). the lists of proteins (entrezgene id) and metabolites (hmdb id) were loaded to build a composite network using protein-protein (string database selected) and metabolite-protein (kegg database selected) interaction types. the primary network relied on the metabolite input. pathway enrichment analysis was performed using the built-in reactome and kegg databases. hypergeometric test was used to compute p-values. . statistical analysis all experiments were performed in triplicate. based on the accuracy (determination of real fold- changes) of swath-based quantification ( ), proteins with a fold change ≥ . or ≤ / . and a p-value < . (welch’s t-test) were considered as differentially abundant between ng and hg + hi conditions. for the metabolomics data, features with a fold change ≥ . or ≤ / . and a p- value < . (welch’s t-test) were considered as differentially abundant. we did not apply multiple- test corrections to calculate adjusted p-values, because this process could obscure proteins or metabolites with real changes (true-positives) ( ). instead, the analysis was focused on top- enriched signaling pathways (adjusted p-value < . ) that allowed us to determine a set of interacting proteins and metabolites with relevant biological information and contributes in reducing false positives. for multivariate statistical analysis and heatmap visualization, metaboanalyst . (https://www.metaboanalyst.ca) was utilized. principal component analysis (pca) was used to assess for sample clustering behavior and inter-group variation. no scaling was used for pca and heatmap analysis. software prism . (graphpad software, san diego, ca) was used for the creation of volcano plots and column graphs. . data availability .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the raw datasets supporting the metabolomics results are available in the gnps/massive public repository ( ) under the accession number msv . the specific parameters of the tools employed for metabolite annotation are available on the following links: for classical molecular networking, https://gnps.ucsd.edu/proteosafe/status.jsp?task= b d e a bc eebf b b; for fbmn https://gnps.ucsd.edu/proteosafe/status.jsp?task= e e d df d ; for nap, https://proteomics .ucsd.edu/proteosafe/status.jsp?task= cda c df d a f afb ; for ms lda, http://ms lda.org/basicviz/summary/ / (need to log-in as a registered or guest user); for molnetenhancer, https://gnps.ucsd.edu/proteosafe/status.jsp?task=de b c e ffab a fd. the quantitative results generated using the xcms platform can be accessed after logging into the following link https://xcmsonline.scripps.edu and searching for the job number . swath data is accessible on the proteomexchange with dataset identifier pxd . . results untargeted metabolomics overall features or potential metabolites were detected using xcms and mzmine, wherein (~ %) features were commonly identified in both platforms (figure a). based on the relative quantification using xcms, and features were detected with reduced and increased abundances respectively in the experimental group compared to the control group (figure b). the effects of hg and hi in the experimental group are observed by pca analysis wherein the experimental samples clustered away from the control group (figure c). the consistency of the lc-ms equipment is apparent by the clustering of the qc samples (figure c). further, the heatmap visualization of the top -modulated metabolites exhibited the different distribution patterns among groups (figure d). using the gnps platform for automatic metabolite annotation, compounds (excluding duplicates and contaminants) were putatively annotated with a level confidence annotation (ms spectral match) (table s ) in agreeance with the msi classification ( ). some metabolites identified by the gnps platform could not be quantified because they were not detected by the xcms algorithm during feature area normalization and quantification. moreover, gnps molecular networking aligned the ms - containing features (n= , ) based on their structural similarity, creating independent networks or clusters with at least two connected nodes (figure a). the use of molnetenhancer .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / workflow allowed to putatively identify chemical classes (level , msi) for of the independent networks. the top- most abundant annotated chemical classes and associated metabolites are shown in figure a. three-clusters from the network were further analyzed because they contained annotated metabolites by spectral matching, which facilitates the annotation of other cluster’s nodes. cluster revealed two metabolites linked to the organonitrogen compounds class with reduced abundance in the experimental group (figure b). library spectral match (level , msi) suggest pc( : / : ( z)) and pc( : / : ( z, z)) as putative candidates, which was supported by ms lda phosphocholine-substructure recognition (figure c). in cluster , glutathione-based metabolites (msi level ) were detected through fragments m/z . , . , . , and . retrieved by the m m_ substructure and associated with glutathione structure using mzcloud in silico predictions (figure a). the precursor ion at m/z . and glutathione (annotated at level , msi) were detected with increased abundance in the experimental group. ms lda visualization, at the m m level, correlated with the gnps molecular networking clustering (figure b). in cluster , various phenylalanine-based metabolites were putatively annotated aided by ms lda substructure recognition (figure c and d). within this cluster, glutamyl-phenylalanine (annotated at level , msi) and the precursor ions at m/z . and . presented with increased abundance in the experimental vs. control group. on the other hand, various aminoacids were annotated (level , msi) by gnps spectral matching and manual inspection of data (table s ). threonine, valine, proline, leucine, serine, glutamic acid, methionine, and tyrosine presented increased abundance (fold change range . - . , p< . ) in the experimental vs. control group. particularly, metabolites linked to the catabolism of tryptophan via the serotonin and kynurenine pathway ( ) were annotated (level , msi), including melatonin, acetyl serotonin, and kynurenine (table s ). however, only kynurenine was significantly elevated in the experimental group. the full list of annotated metabolites, differential abundances and another relevant feature information is shown in table s . peptidomics experimental and control datasets were analyzed separately to identify the peptides and their biological modifications. the complete list of peptides identified by proteinpilot between the experimental and control groups are described in table s . proline oxidation was the most frequent biological modification detected in the experimental group datasets. we identified and peptides with a confidence of > % in the control and experimental group, respectively. differential abundance of proline-rich peptides was observed in the experimental group .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / compared to the control group. an additional tripeptide was manually annotated with a lpp sequence (table s ). proteomics the re-analysis of the swath data (pxd dataset) facilitated the identification of quantifiable proteins ( proteins with at least unique peptides, % false discovery rate) and no missing values among technical and biological replicates (table s ). sample datasets were normalized using different methods to select the most appropriate based on quantitative and qualitative parameters on our dataset. quantile normalization produced a better qualitative and quantitative profile and was selected to further process our data (figure s ). pca analysis of normalized data denoted a clear separation of the groups suggesting overall differences in their proteomes (figure a). differential abundance analysis revealed and proteins with increased and decreased abundance in the experimental group (figure b). further, the heatmap visualization of the top -modulated proteins exhibited the different distribution patterns among the experimental and control groups (figure c). to obtain a molecular insight we performed a functional enrichment analysis using a network-based approach. first, we created a composite network comprising ppi between the modulated proteins by simulated diabetes (seed proteins) and their immediate interacting partners (highest confidence > . ) retrieved from string database (incorporated in omicsnet platform). the principal network using the up- modulated proteins consisted of proteins, edges and seed proteins (nodes with blue shadow) and is illustrated in figure d. eight modules or clusters were generated, that may represent relevant complexes or functional units ( ). the most significant (adjusted p-value < . ) reactome and kegg pathways on the global network are shown in table . two modules contained multiple seed proteins and were linked to dna/rna and protein metabolism pathways using the walktrap algorithm (figure d). on the other hand, the principal network using the down-modulated proteins consisted of proteins, edges and seed proteins identified eleven modules wherein one module (with seed proteins) indicated associations with mitochondrial function pathways (figure e). integration of metabolomics and proteomics the signaling pathways perturbed by simulated diabetes were identified by a composite network of interacting metabolites and proteins using omicsnet built-in databases. figure illustrates the composite bi-layered metabolite-ppi network using the up-modulated molecules (under simulated diabetes) comprised of metabolites (seed metabolites), edges, and proteins ( seed .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / proteins). the top-most enriched signaling pathways identified in the composite network are shown in table . the two principal modules highlighted by the walktrap algorithm were linked to glutathione and amino acid metabolism. we noted a smaller interaction between acyl-protein thioesterase (lypla ) and a phosphatidylcholine metabolite when simultaneously analyzing up- and down-modulated proteins and metabolites. no significant composite network was identified using the down-modulated proteins and metabolites. cellular morphology to better understand the effects that simulated diabetes exerts on endothelial cells the changes on cellular structure endpoints were evaluated. the endothelial nuclei morphology in the bcaec control and experimental groups were evaluated using fluorescent-staining and image analysis. we also evaluated the presence of vwf (marker of endothelial cells) in bcaec and hcaec, to reveal the cellular boundary and to demonstrate their endothelial phenotype ( ). we noted an increase in the percentage of binucleated bcaec in the experimental group compared to the control group (top panel figure a and b). a similar result with larger nuclei, was observed when using hcaec as a human in vitro model (bottom panel figure a and b). finally, as expected, we observed a typical intracellular localization of vwf and a % positivity in endothelial cells. . discussion this study investigated the molecular perturbations occurring in coronary endothelium cells subjected to prolonged simulated diabetes that facilitated the identification of signaling pathways and specific molecules that could be associated with the development of cardiovascular disease. to achieve this, we employed a ms-based multi-omics approach coupled to fluorescence microscopy to detect structural changes. endothelial cells cover the inner surface of blood vessels and are distributed across the body. their functions include: acting as a mechanical barrier between the circulating blood and adjacent tissues as well as modulating multiple functions in distinct organs ( ). these regulatory functions vary according to localization and vascular bed- origin ( ). hg blood levels are detrimental to endothelial cells function in t dm leading to coronary endothelial dysfunction and development of cvd ( , ). the molecular effects of hg on endothelial cells have been previously characterized ( , , , , ); nevertheless, the endothelial cell types used in these studies are not intrinsically involved in cvd. the present study used an in vitro model involving endothelial cells that modulate the heart function, caec ( ). our model not only used hg ( mmol/l) to simulate diabetes ( , , , , ) but first induced .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / insulin resistance to mimic the pathophysiological conditions that occur in t dm wherein hyperinsulinemia precedes hyperglycemia ( ). diabetes was simulated for up to days to mimic chronic hg exposure and to prevent measuring cell proliferation known to occur in early hg ( , ). despite a lack of apparent increase in cell proliferation in the experimental group compared to control group after twelve days, an increase in overall protein abundance was detected by bradford assay (data not shown) and inferred from total ion chromatogram (tic) of ms (figure s a). we suggest that protein synthesis is increased as a consequence of the higher presence of bi-nucleated caec (with increased dna/rna metabolism) under hg + hi compared to that in the control cohort (figure a and b). previous studies have shown reduced endothelial cell proliferation (mostly in huvec) after long-term ( - days) hg exposure ( , , - ), accompanied by an increase in protein synthesis ( ). this ms-based methodological pipeline that included appropriate controls during data acquisition (qc) and processing (e.g., normalization, filtering, annotation, dereplication, etc.), allowed the identification of global changes in the metabolome of caec under hg + hi. specifically, increased abundance of valine, leucine, tyrosine, serine, leucine, proline, methionine, and glutamic acid in cells under hg conditions was observed; and this is consistent with reports on human aortic endothelial cells ( ). notably, several clinical studies have established a direct relationship between prevalence/incidence of t dm and increased levels of valine, leucine and tyrosine in serum and plasma ( - ). our results support the role of caec in contributing to the elevated pool of amino acids seen in circulation under a hg environment. we speculate that increased levels of these amino acids could result from either increased production or reduced degradation as suggested in endothelial cells (immortalized cell line, ea.hy ) that transition from a glycolytic metabolism towards lipid and amino acid oxidation when challenged by hg ( ). furthermore, evidence of increased tryptophan catabolism was identified through the kynurenine pathway. in this regard, a non-significant decrease of ~ % in the abundance of tryptophan was detected. however, a significant increase of ~ % in kynurenine (tryptophan’s main metabolite) ( ) between the hg + hi group and ng group was also observed, which is a key finding as elevated plasma levels of kynurenine are known to increase cvd risk ( , ). this novel finding contributes to expanding the understanding of amino acid metabolism in endothelial cells under simulated diabetes. acetyl serotonin and melatonin which are components of the serotonin pathway that degrades tryptophan ( ) were also detected with only minor abundancy increases ( - %) in the hg + hi group compared to control. differences in glutathione (cysteine-glutamic acid-glycine, tripeptide) metabolism in caec were also found, suggesting an increased response to oxidative stress ( ). in line with this observation, previous research reported a glutathione-dependent .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / reaction to ambient hg in artery-derived endothelial cells ( , ) but the same could not be observed in vein-derived endothelial cells ( , ). this emphasizes the different responses to hg among endothelial phenotypes. here, novel evidence is provided of the up-regulation of glutathione-based metabolites. the composite protein network suggested an increase in glutathione metabolism supported by elevated levels of oxidized glutathione and, one of its synthetic precursors, glutamic acid. at the protein level, peroxiredoxin (prdx and prdx ) and thioredoxin (txn , mitochondrial) showed increased abundances in the experimental group, which are part of the cells natural enzymatic defense against oxidative stress ( ). the substructure analysis of metabolomics data facilitated identifying glutamic acid- and phenylalanine-based metabolites, presumably di- or tri-peptides, including the annotated metabolite glutamyl-phenylalanine. furthermore, the caec peptidome analysis suggested an increase in proline-containing peptides. this type of peptide is of particular interest because of their resistance to non-specific proteolytic degradation, body distribution and remarkable biological effects ( - ). yet, the precise function of such phenylalanine-, glutamine-, and proline-based peptides remains to be characterized in caec. we can only speculate that they are the result of a compensatory mechanism to reduce glucose cellular damage. also, increased protein abundance of core and regulatory subunits from the proteasome complex (psma and psmd ) was found in cells under simulated diabetes. this suggests an increased protein degradation and subsequent peptide formation in response to hg. metabolomic profiling also revealed changes in the lipidome of caec challenged with hg + hi, wherein a reduction in phosphatidylcholine (pc) lipids and subsequent increase in phosphocholine were noted. changes in the phospholipidomic profile of bovine aortic endothelial cells treated with hg for h has also been reported in a lipidome study ( ). here, proteomics and metabolomics data were manually integrated and this allowed to determine critical roles for pafah b and lypla in mediating the degradation of pc lipids (figure ). pafah b was found to be up-regulated in this study and it is known to be associated with inflammation and higher levels of lysopc ( ). as a result, pafah b could increase the pool of lysopc lipids, further exacerbating inflammation in the cardiovascular system ( ). on the other hand, lypla has a lysophospholipase activity that can hydrolyze a range of lysophospholipids, including lysopc, thereby generating a fatty acid and glycerophosphocholine as products ( ). increased levels of phosphocholine (~ %) were detected in hg treated cells compared to control, that could be associated with the degradation of lysopc lipids. it should be noted that the use of pathways databases such as kegg and reactome possess some limitations when dealing with lipid metabolites because its chemical diversity is not well annotated/defined within the databases. for example, kegg .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / provides a chemical class identifier instead of individual identity to lipids, constricting their biological importance ( ). thus, based on our manual inspection of the metabolomics- proteomics data and in line with the evidence, we suggest that simulated diabetes evokes inflammation on bcaec and that pafah b and lypla play a role in modulating such process. previously, we reported the multinucleation of caec cultured under simulated diabetes ( ). this type of cell possesses ≥ nuclei. here, we replicated our previous findings of increased binucleation in bcaec. the same outcome was obtained when using hcaec as a human in vitro model (figure a and b), validating the binucleation process in other caec. after refinement of lc-ms data and bioinformatics re-processing of published swath-based datasets of bcaec under simulated diabetes ( ), molecular signatures and pathways that could be linked to the binucleation process were found (figure ). for instance, we noted an increased abundance of proteins, under simulated diabetes, with reported nuclei localization and linked to dna metabolism, including ribosomal proteins rps , rps , and rpl ( ). further, we observed an increased abundance of proteasome proteins, psma and psmd , which are linked to protein metabolism ( ). hence, we infer that the caec binucleation occurs as a compensatory mechanism to increase the cell capacity to metabolize the excess of ambient glucose by increasing the cell metabolic machinery (transcription/translation processes). although an increase in cell proliferation could boost a coordinated increase of ribosomal and proteasome proteins, we do not believe this is the case here, as mentioned before. after - days of simulated diabetes, cells occupied % of the well's plate surface, thereby impeding to harbor more cells because endothelial cells grow as a monolayer. this is consistent with findings stating that when endothelial cells become highly confluent, they stop growing due to cell-cell contact, even in the presence of growth factors ( ). in support of this, up-stream (ctgf and cd ) ( , ) (table s ) and down-stream proteins (fabp ) ( ) (table s ) involved in angiogenesis and proliferation were down-regulated by simulated diabetes. importantly, there is evidence (not in endothelial cells) of cellular processes contributing to the stimulation of cellular binucleation without increases in cell proliferation, including cellular enhancement of antimicrobial defenses ( ), senescence ( ), and malignancy ( ). various mechanisms have been linked to the binucleation process, such as cytokinesis failure, cellular fusion, mitotic slippage, and endoreduplication ( ). the elucidation of the exact molecular mechanisms leading to the binucleation process of caec is beyond the scope of our study. in conclusion, this study applied an integrated multi-omics and bioinformatics/chemoinformatics approach to characterize the molecular perturbations that simulated diabetes exerts on caec. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we confirmed several independent studies that reported alterations at protein and metabolite levels in endothelial cells of different sources than coronary vessels. metabolomics, identified alterations in amino acid, peptide, and phospholipid metabolism. notably, the chemoinformatic analysis identified unreported alterations of phenylalanine-, glutathione-, and proline-based peptides on coronary endothelium under simulated diabetes. proteomics provided evidence of reduced mitochondrial mass and angiogenesis. the integration of proteomics and metabolomics identified increased glutamic acid metabolism and suggested that the antioxidant enzymes are involved in protecting the cells from oxidative stress. fluorescence microscopy reported the appearance of non-proliferative binucleated caec cells as a mean to metabolize the excess of ambient glucose. overall, our study improved the understanding of the molecular disturbances caused by simulated diabetes that could mediate caec dysfunction and may be relevant in the context of cvd in subjects with t dm. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . acknowledgements this work was derived in part from the thesis project of h.c.d.h. at the posgrado en ciencias de la vida, cicese. we thank alan g. hernández-melgar for his invaluable technical assistance with the normalyzerde software. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . funding part of this work was supported by cicese (grant no. to amu and internal project no. - from cad), nih r dk (to fv), and va merit-i bx (to fv). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . conflict of interest dr. villarreal is a co-founder and stockholder of cardero therapeutics, inc. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . author contributions a.m.u. contributed to the study conception and design, data acquisition, formal analysis, methodology, project administration, and funding acquisition. h.c.d.h., l.d.m, and r.a.c.c. contributed to the data acquisition, formal analysis and interpretation of some experiments. c.a.d., and f.v. contributed to funding acquisition and resources. o.m.p contributed to data interpretation and critical revision of manuscript. all authors contributed to the drafting, revising, and approval of the final version of the manuscript. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . halcox, j. p.; schenke, w. h.; zalos, g.; mincemoyer, r.; prasad, a.; waclawiw, m. a.; nour, k. r.; quyyumi, a. a., prognostic value of coronary vascular endothelial dysfunction. circulation , , ( ), - . . lind, m.; wedel, h.; rosengren, a., excess mortality among persons with type diabetes. n engl j med , , ( ), - . . gutierrez, e.; flammer, a. j.; lerman, l. o.; elizaga, j.; lerman, a.; fernandez-aviles, f., endothelial dysfunction over the course of coronary artery disease. eur heart j , , ( ), - . . lorenzi, m.; cagliero, e.; toledo, s., glucose toxicity for human endothelial cells in culture. delayed replication, disturbed cell cycle, and accelerated death. diabetes , , ( ), - . . kageyama, s.; yokoo, h.; tomita, k.; kageyama-yahara, n.; uchimido, r.; matsuda, n.; yamamoto, s.; hattori, y., high glucose-induced apoptosis in human coronary artery endothelial cells involves up-regulation of death receptors. cardiovasc diabetol , , . . dubois, s.; madec, a. m.; mesnier, a.; armanet, m.; chikh, k.; berney, t.; thivolet, c., glucose inhibits angiogenesis of isolated human pancreatic islets. j mol endocrinol , , ( ), - . . lorenzi, m.; montisano, d. f.; toledo, s.; barrieux, a., high glucose induces dna damage in cultured human endothelial cells. j clin invest , , ( ), - . . patel, h.; chen, j.; das, k. c.; kavdia, m., hyperglycemia induces differential change in oxidative stress at gene expression and functional levels in huvec and hmvec. cardiovasc diabetol , , . . pala, l.; pezzatini, a.; dicembrini, i.; ciani, s.; gelmini, s.; vannelli, b. g.; cresci, b.; mannucci, e.; rotella, c. m., different modulation of dipeptidyl peptidase- activity between microvascular and macrovascular human endothelial cells. acta diabetol , suppl , s - . . esposito, c.; fasoli, g.; plati, a. r.; bellotti, n.; conte, m. m.; cornacchia, f.; foschi, a.; mazzullo, t.; semeraro, l.; dal canton, a., long-term exposure to high glucose up-regulates vcam-induced endothelial cell adhesiveness to pbmc. kidney int , , ( ), - . . baumgartner-parzer, s. m.; wagner, l.; pettermann, m.; grillari, j.; gessl, a.; waldhausl, w., high-glucose--triggered apoptosis in cultured endothelial cells. diabetes , , ( ), - . . ramirez-sanchez, i.; rodriguez, a.; moreno-ulloa, a.; ceballos, g.; villarreal, f., (-)- epicatechin-induced recovery of mitochondria from simulated diabetes: potential role of endothelial nitric oxide synthase. diab vasc dis res , , ( ), - . . liu, t.; gong, j.; chen, y.; jiang, s., periodic vs constant high glucose in inducing pro- inflammatory cytokine expression in human coronary artery endothelial cells. inflamm res , , ( ), - . . liu, t. s.; pei, y. h.; peng, y. p.; chen, j.; jiang, s. s.; gong, j. b., oscillating high glucose enhances oxidative stress and apoptosis in human coronary artery endothelial cells. j endocrinol invest , , ( ), - . . hilda carolina delgado de la herrán, l. d.-m., carolina Álvarez-delgado, francisco villarreal, aldo moreno-ulloa, formation of multinucleated variant endothelial cells with altered mitochondrial function in cultured coronary endothelium under simulated diabetes. biorxiv . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . li, x. x.; liu, y. m.; li, y. j.; xie, n.; yan, y. f.; chi, y. l.; zhou, l.; xie, s. y.; wang, p. y., high glucose concentration induces endothelial cell proliferation by regulating cyclin-d - related mir- . j cell mol med , , ( ), - . . madonna, r.; de caterina, r., prolonged exposure to high insulin impairs the endothelial pi -kinase/akt/nitric oxide signalling. thromb haemost , , ( ), - . . zaccardi, f.; webb, d. r.; yates, t.; davies, m. j., pathophysiology of type and type diabetes mellitus: a -year perspective. postgrad med j , , ( ), - . . moreno-ulloa, a.; miranda-cervantes, a.; licea-navarro, a.; mansour, c.; beltran- partida, e.; donis-maturano, l.; delgado de la herran, h. c.; villarreal, f.; alvarez-delgado, c., (-)-epicatechin stimulates mitochondrial biogenesis and cell growth in c c myotubes via the g-protein coupled estrogen receptor. eur j pharmacol , , - . . kirkwood, j. s.; maier, c.; stevens, j. f., simultaneous, untargeted metabolic profiling of polar and nonpolar metabolites by lc-q-tof mass spectrometry. curr protoc toxicol , chapter , unit . . moreno-ulloa, a.; sicairos diaz, v.; tejeda-mora, j. a.; macias contreras, m. i.; castillo, f. d.; guerrero, a.; gonzalez sanchez, r.; mendoza-porras, o.; vazquez duhalt, r.; licea- navarro, a., chemical profiling provides insights into the metabolic machinery of hydrocarbon- degrading deep-sea microbes. msystems , , ( ). . gowda, h.; ivanisevic, j.; johnson, c. h.; kurczy, m. e.; benton, h. p.; rinehart, d.; nguyen, t.; ray, j.; kuehl, j.; arevalo, b.; westenskow, p. d.; wang, j.; arkin, a. p.; deutschbauer, a. m.; patti, g. j.; siuzdak, g., interactive xcms online: simplifying advanced metabolomic data processing and subsequent statistical analyses. anal chem , , ( ), - . . pluskal, t.; castillo, s.; villar-briones, a.; oresic, m., mzmine : modular framework for processing, visualizing, and analyzing mass spectrometry-based molecular profile data. bmc bioinformatics , , . . aron, a. t.; gentry, e. c.; mcphail, k. l.; nothias, l. f.; nothias-esposito, m.; bouslimani, a.; petras, d.; gauglitz, j. m.; sikora, n.; vargas, f.; van der hooft, j. j. j.; ernst, m.; kang, k. b.; aceves, c. m.; caraballo-rodriguez, a. m.; koester, i.; weldon, k. c.; bertrand, s.; roullier, c.; sun, k.; tehan, r. m.; boya, p. c.; christian, m. h.; gutierrez, m.; ulloa, a. m.; tejeda mora, j. a.; mojica-flores, r.; lakey-beitia, j.; vasquez-chaves, v.; zhang, y.; calderon, a. i.; tayler, n.; keyzers, r. a.; tugizimana, f.; ndlovu, n.; aksenov, a. a.; jarmusch, a. k.; schmid, r.; truman, a. w.; bandeira, n.; wang, m.; dorrestein, p. c., reproducible molecular networking of untargeted mass spectrometry data using gnps. nat protoc . . da silva, r. r.; wang, m.; nothias, l. f.; van der hooft, j. j. j.; caraballo-rodriguez, a. m.; fox, e.; balunas, m. j.; klassen, j. l.; lopes, n. p.; dorrestein, p. c., propagating annotations of molecular networks using in silico fragmentation. plos comput biol , , ( ), e . . van der hooft, j. j.; wandy, j.; barrett, m. p.; burgess, k. e.; rogers, s., topic modeling for untargeted substructure exploration in metabolomics. proc natl acad sci u s a , , ( ), - . . djoumbou feunang, y.; eisner, r.; knox, c.; chepelev, l.; hastings, j.; owen, g.; fahy, e.; steinbeck, c.; subramanian, s.; bolton, e.; greiner, r.; wishart, d. s., classyfire: automated chemical classification with a comprehensive, computable taxonomy. j cheminform , , . . schymanski, e. l.; jeon, j.; gulde, r.; fenner, k.; ruff, m.; singer, h. p.; hollender, j., identifying small molecules via high resolution mass spectrometry: communicating confidence. environ sci technol , , ( ), - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . ernst, m.; kang, k. b.; caraballo-rodriguez, a. m.; nothias, l. f.; wandy, j.; chen, c.; wang, m.; rogers, s.; medema, m. h.; dorrestein, p. c.; van der hooft, j. j. j., molnetenhancer: enhanced molecular networks by integrating metabolome mining and annotation tools. metabolites , , ( ). . shannon, p.; markiel, a.; ozier, o.; baliga, n. s.; wang, j. t.; ramage, d.; amin, n.; schwikowski, b.; ideker, t., cytoscape: a software environment for integrated models of biomolecular interaction networks. genome res , , ( ), - . . perez-riverol, y.; csordas, a.; bai, j.; bernal-llinares, m.; hewapathirana, s.; kundu, d. j.; inuganti, a.; griss, j.; mayer, g.; eisenacher, m.; perez, e.; uszkoreit, j.; pfeuffer, j.; sachsenberg, t.; yilmaz, s.; tiwary, s.; cox, j.; audain, e.; walzer, m.; jarnuczak, a. f.; ternent, t.; brazma, a.; vizcaino, j. a., the pride database and related tools and resources in : improving support for quantification data. nucleic acids res , , (d ), d - d . . willforss, j.; chawade, a.; levander, f., normalyzerde: online tool for improved normalization of omics expression data and high-sensitivity differential expression analysis. j proteome res , , ( ), - . . zhou, g.; xia, j., using omicsnet for network integration and d visualization. curr protoc bioinformatics , , ( ), e . . zhou, g.; xia, j., omicsnet: a web-based tool for creation and visual analysis of biological networks in d space. nucleic acids res , , (w ), w -w . . szklarczyk, d.; franceschini, a.; wyder, s.; forslund, k.; heller, d.; huerta-cepas, j.; simonovic, m.; roth, a.; santos, a.; tsafou, k. p.; kuhn, m.; bork, p.; jensen, l. j.; von mering, c., string v : protein-protein interaction networks, integrated over the tree of life. nucleic acids res , , (database issue), d - . . muntel, j.; kirkpatrick, j.; bruderer, r.; huang, t.; vitek, o.; ori, a.; reiter, l., comparison of protein quantification in a complex background by dia and tmt workflows with fixed instrument time. j proteome res , , ( ), - . . pascovici, d.; handler, d. c.; wu, j. x.; haynes, p. a., multiple testing corrections in quantitative proteomics: a useful but blunt tool. proteomics , , ( ), - . . wang, m.; carver, j. j.; phelan, v. v.; sanchez, l. m.; garg, n.; peng, y.; nguyen, d. d.; watrous, j.; kapono, c. a.; luzzatto-knaan, t.; porto, c.; bouslimani, a.; melnik, a. v.; meehan, m. j.; liu, w. t.; crusemann, m.; boudreau, p. d.; esquenazi, e.; sandoval-calderon, m.; kersten, r. d.; pace, l. a.; quinn, r. a.; duncan, k. r.; hsu, c. c.; floros, d. j.; gavilan, r. g.; kleigrewe, k.; northen, t.; dutton, r. j.; parrot, d.; carlson, e. e.; aigle, b.; michelsen, c. f.; jelsbak, l.; sohlenkamp, c.; pevzner, p.; edlund, a.; mclean, j.; piel, j.; murphy, b. t.; gerwick, l.; liaw, c. c.; yang, y. l.; humpf, h. u.; maansson, m.; keyzers, r. a.; sims, a. c.; johnson, a. r.; sidebottom, a. m.; sedio, b. e.; klitgaard, a.; larson, c. b.; p, c. a. b.; torres- mendoza, d.; gonzalez, d. j.; silva, d. b.; marques, l. m.; demarque, d. p.; pociute, e.; o'neill, e. c.; briand, e.; helfrich, e. j. n.; granatosky, e. a.; glukhov, e.; ryffel, f.; houson, h.; mohimani, h.; kharbush, j. j.; zeng, y.; vorholt, j. a.; kurita, k. l.; charusanti, p.; mcphail, k. l.; nielsen, k. f.; vuong, l.; elfeki, m.; traxler, m. f.; engene, n.; koyama, n.; vining, o. b.; baric, r.; silva, r. r.; mascuch, s. j.; tomasi, s.; jenkins, s.; macherla, v.; hoffman, t.; agarwal, v.; williams, p. g.; dai, j.; neupane, r.; gurr, j.; rodriguez, a. m. c.; lamsa, a.; zhang, c.; dorrestein, k.; duggan, b. m.; almaliti, j.; allard, p. m.; phapale, p.; nothias, l. f.; alexandrov, t.; litaudon, m.; wolfender, j. l.; kyle, j. e.; metz, t. o.; peryea, t.; nguyen, d. t.; vanleer, d.; shinn, p.; jadhav, a.; muller, r.; waters, k. m.; shi, w.; liu, x.; zhang, l.; knight, r.; jensen, p. r.; palsson, b. o.; pogliano, k.; linington, r. g.; gutierrez, m.; lopes, n. p.; gerwick, w. h.; moore, b. s.; dorrestein, p. c.; bandeira, n., sharing and community .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / curation of mass spectrometry data with global natural products social molecular networking. nat biotechnol , , ( ), - . . bender, d. a., biochemistry of tryptophan in health and disease. mol aspects med , , ( ), - . . poyatos, j. f.; hurst, l. d., how biologically relevant are interaction-based modules in protein networks? genome biol , , ( ), r . . muller, a. m.; hermanns, m. i.; skrzynski, c.; nesslinger, m.; muller, k. m.; kirkpatrick, c. j., expression of the endothelial markers pecam- , vwf, and cd in vivo and in vitro. exp mol pathol , , ( ), - . . aird, w. c., phenotypic heterogeneity of the endothelium: ii. representative vascular beds. circ res , , ( ), - . . aird, w. c., endothelial cell heterogeneity. cold spring harb perspect med , , ( ), a . . widlansky, m. e.; gokce, n.; keaney, j. f., jr.; vita, j. a., the clinical implications of endothelial dysfunction. j am coll cardiol , , ( ), - . . ganz, p.; vita, j. a., testing endothelial vasomotor function: nitric oxide, a multipotent molecule. circulation , , ( ), - . . paulus, w. j.; vantrimpont, p. j.; shah, a. m., paracrine coronary endothelial control of left ventricular function in humans. circulation , , ( ), - . . abe, m.; ono, j.; sato, y.; okeda, t.; takaki, r., effects of glucose and insulin on cultured human microvascular endothelial cells. diabetes res clin pract , , ( ), - . . du, x. l.; sui, g. z.; stockklauser-farber, k.; weiss, j.; zink, s.; schwippert, b.; wu, q. x.; tschope, d.; rosen, p., introduction of apoptosis by high proinsulin and glucose in cultured human umbilical vein endothelial cells is mediated by reactive oxygen species. diabetologia , , ( ), - . . graier, w. f.; grubenthal, i.; dittrich, p.; wascher, t. c.; kostner, g. m., intracellular mechanism of high d-glucose-induced modulation of vascular cell proliferation. eur j pharmacol , , ( ), - . . kamal, k.; du, w.; mills, i.; sumpio, b. e., antiproliferative effect of elevated glucose in human microvascular endothelial cells. j cell biochem , , ( ), - . . lorenzi, m.; nordberg, j. a.; toledo, s., high glucose prolongs cell-cycle traversal of cultured human endothelial cells. diabetes , , ( ), - . . quagliaro, l.; piconi, l.; assaloni, r.; martinelli, l.; motz, e.; ceriello, a., intermittent high glucose enhances apoptosis related to oxidative stress in human umbilical vein endothelial cells: the role of protein kinase c and nad(p)h-oxidase activation. diabetes , , ( ), - . . mcginn, s.; poronnik, p.; king, m.; gallery, e. d.; pollock, c. a., high glucose and endothelial cell growth: novel effects independent of autocrine tgf-beta and hyperosmolarity. am j physiol cell physiol , , ( ), c - . . yuan, w.; zhang, j.; li, s.; edwards, j. l., amine metabolomics of hyperglycemic endothelial cells using capillary lc-ms with isobaric tagging. j proteome res , , ( ), - . . chen, s.; akter, s.; kuwahara, k.; matsushita, y.; nakagawa, t.; konishi, m.; honda, t.; yamamoto, s.; hayashi, t.; noda, m.; mizoue, t., serum amino acid profiles and risk of type diabetes among japanese adults in the hitachi health study. sci rep , , ( ), . . lai, m.; liu, y.; ronnett, g. v.; wu, a.; cox, b. j.; dai, f. f.; rost, h. l.; gunderson, e. p.; wheeler, m. b., amino acid and lipid metabolism in post-gestational diabetes and progression to type diabetes: a metabolic profiling study. plos med , , ( ), e . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . lu, y.; wang, y.; liang, x.; zou, l.; ong, c. n.; yuan, j. m.; koh, w. p.; pan, a., serum amino acids in association with prevalent and incident type diabetes in a chinese population. metabolites , , ( ). . menni, c.; fauman, e.; erte, i.; perry, j. r.; kastenmuller, g.; shin, s. y.; petersen, a. k.; hyde, c.; psatha, m.; ward, k. j.; yuan, w.; milburn, m.; palmer, c. n.; frayling, t. m.; trimmer, j.; bell, j. t.; gieger, c.; mohney, r. p.; brosnan, m. j.; suhre, k.; soranzo, n.; spector, t. d., biomarkers for type diabetes and impaired fasting glucose using a nontargeted metabolomics approach. diabetes , , ( ), - . . wang, t. j.; larson, m. g.; vasan, r. s.; cheng, s.; rhee, e. p.; mccabe, e.; lewis, g. d.; fox, c. s.; jacques, p. f.; fernandez, c.; o'donnell, c. j.; carr, s. a.; mootha, v. k.; florez, j. c.; souza, a.; melander, o.; clish, c. b.; gerszten, r. e., metabolite profiles and the risk of developing diabetes. nat med , , ( ), - . . koziel, a.; woyda-ploszczyca, a.; kicinska, a.; jarmuszkiewicz, w., the influence of high glucose on the aerobic metabolism of endothelial ea.hy cells. pflugers arch , , ( ), - . . badawy, a. a., kynurenine pathway of tryptophan metabolism: regulatory and functional aspects. int j tryptophan res , , . . pedersen, e. r.; tuseth, n.; eussen, s. j.; ueland, p. m.; strand, e.; svingen, g. f.; midttun, o.; meyer, k.; mellgren, g.; ulvik, a.; nordrehaug, j. e.; nilsen, d. w.; nygard, o., associations of plasma kynurenines with risk of acute myocardial infarction in patients with stable angina pectoris. arterioscler thromb vasc biol , , ( ), - . . sulo, g.; vollset, s. e.; nygard, o.; midttun, o.; ueland, p. m.; eussen, s. j.; pedersen, e. r.; tell, g. s., neopterin and kynurenine-tryptophan ratio as predictors of coronary events in older adults, the hordaland health study. int j cardiol , , ( ), - . . polyzos, k. a.; ketelhuth, d. f., the role of the kynurenine pathway of tryptophan metabolism in cardiovascular disease. an emerging field. hamostaseologie , , ( ), - . . aquilano, k.; baldelli, s.; ciriolo, m. r., glutathione: new roles in redox signaling for an old antioxidant. front pharmacol , , . . yuan, w.; edwards, j. l., thiol metabolomics of endothelial cells using capillary liquid chromatography mass spectrometry with isotope coded affinity tags. j chromatogr a , , ( ), - . . weidig, p.; mcmaster, d.; bayraktutan, u., high glucose mediates pro-oxidant and antioxidant enzyme activities in coronary endothelial cells. diabetes obes metab , , ( ), - . . felice, f.; lucchesi, d.; di stefano, r.; barsotti, m. c.; storti, e.; penno, g.; balbarini, a.; del prato, s.; pucci, l., oxidative stress in response to high glucose levels in endothelial cells and in endothelial progenitor cells: evidence for differential glutathione peroxidase- expression. microvasc res , , ( ), - . . kashiwagi, a.; asahina, t.; ikebuchi, m.; tanaka, y.; takagi, y.; nishio, y.; kikkawa, r.; shigeta, y., abnormal glutathione metabolism and increased cytotoxicity caused by h o in human umbilical vein endothelial cells cultured in high glucose medium. diabetologia , , ( ), - . . hanschmann, e. m.; godoy, j. r.; berndt, c.; hudemann, c.; lillig, c. h., thioredoxins, glutaredoxins, and peroxiredoxins--molecular mechanisms and health significance: from cofactors to antioxidants to redox signaling. antioxid redox signal , , ( ), - . . scocchi, m.; tossi, a.; gennaro, r., proline-rich antimicrobial peptides: converging to a non-lytic mechanism of action. cell mol life sci , , ( ), - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . migliaccio, a.; castoria, g.; de falco, a.; bilancio, a.; giovannelli, p.; di donato, m.; marino, i.; yamaguchi, h.; appella, e.; auricchio, f., polyproline and tat transduction peptides in the study of the rapid actions of steroid receptors. steroids , , ( ), - . . radicioni, g.; stringaro, a.; molinari, a.; nocca, g.; longhi, r.; pirolli, d.; scarano, e.; iavarone, f.; manconi, b.; cabras, t.; messana, i.; castagnola, m.; vitali, a., characterization of the cell penetrating properties of a human salivary proline-rich peptide. biochim biophys acta , , ( pt a), - . . vanhoof, g.; goossens, f.; de meester, i.; hendriks, d.; scharpe, s., proline motifs in peptides and their biological processing. faseb j , , ( ), - . . colombo, s.; melo, t.; martinez-lopez, m.; carrasco, m. j.; domingues, m. r.; perez- sala, d.; domingues, p., phospholipidome of endothelial cells shows a different adaptation response upon oxidative, glycative and lipoxidative stress. sci rep , , ( ), . . de keyzer, d.; karabina, s. a.; wei, w.; geeraert, b.; stengel, d.; marsillach, j.; camps, j.; holvoet, p.; ninio, e., increased pafah and oxidized lipids are associated with inflammation and atherosclerosis in hypercholesterolemic pigs. arterioscler thromb vasc biol , , ( ), - . . tselepis, a. d.; john chapman, m., inflammation, bioactive lipids and atherosclerosis: potential roles of a lipoprotein-associated phospholipase a , platelet activating factor- acetylhydrolase. atheroscler suppl , , ( ), - . . wang, a.; dennis, e. a., mammalian lysophospholipases. biochim biophys acta , , ( ), - . . marco-ramell, a.; palau-rodriguez, m.; alay, a.; tulipani, s.; urpi-sarda, m.; sanchez- pla, a.; andres-lacueva, c., evaluation and comparison of bioinformatic tools for the enrichment analysis of metabolomics data. bmc bioinformatics , , ( ), . . zhou, x.; liao, w. j.; liao, j. m.; liao, p.; lu, h., ribosomal proteins: functions beyond the ribosome. j mol cell biol , , ( ), - . . goldberg, a. l., protein degradation and protection against misfolded or damaged proteins. nature , , ( ), - . . vinals, f.; pouyssegur, j., confluence of vascular endothelial cells induces cell cycle exit by inhibiting p /p mitogen-activated protein kinase activity. mol cell biol , , ( ), - . . yu, y.; moulton, k. s.; khan, m. k.; vineberg, s.; boye, e.; davis, v. m.; o'donnell, p. e.; bischoff, j.; milstone, d. s., e-selectin is required for the antiangiogenic activity of endostatin. proc natl acad sci u s a , , ( ), - . . brigstock, d. r., regulation of angiogenesis and endothelial cell function by connective tissue growth factor (ctgf) and cysteine-rich (cyr ). angiogenesis , , ( ), - . . elmasri, h.; ghelfi, e.; yu, c. w.; traphagen, s.; cernadas, m.; cao, h.; shi, g. p.; plutzky, j.; sahin, m.; hotamisligil, g.; cataltepe, s., endothelial cell-fatty acid binding protein promotes angiogenesis: role of stem cell factor/c-kit pathway. angiogenesis , , ( ), - . . quinn, m. t.; schepetkin, i. a., role of nadph oxidase in formation and function of multinucleated giant cells. j innate immun , , ( ), - . . holt, d. j.; grainger, d. w., multinucleated giant cells from fibroblast cultures. biomaterials , , ( ), - . . tse, g. m.; law, b. k.; chan, k. f.; mas, t. k., multinucleated stromal giant cells in mammary phyllodes tumours. pathology , , ( ), - . . celton-morizur, s.; merlen, g.; couton, d.; desdouets, c., polyploidy and liver proliferation: central role of insulin signaling. cell cycle , , ( ), - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure legends figure . illustration of the methodology followed in this study. figure . simulated diabetes induced changes in the metabolome of bovine coronary artery endothelial cells (bcaec). (a) venn diagram of features identified among mzmine and xcms software ( . da and min retention time, thresholds) on lc-ms datasets. (b) volcano plot of all quantified metabolites displaying differences in relative abundance (> +/- % change, < . p-value cut-offs) between bcaec cultured in control (ng) media and simulated diabetes (hg+ hi) for twelve days. values (dots) represent the hg+hi/ng ratio for all metabolites. red and blue dots denote downregulated and upregulated metabolites in the hg + hi group vs. ng group, respectively. (c) principal component analysis (pca) of lc-ms datasets. data was log transformed without scaling. shade areas depict the % confidence intervals. (c) heatmap of the top metabolites ranked by t-test. abbreviations: ng, normal glucose; hg, high glucose; hi, high insulin; qc, quality control. figure . bovine coronary artery endothelial cells (bcaec) metabolite molecular network. (a) molecular classes (according to classyfire) of the metabolome identified by the molnetenhancer workflow and visualized by cytoscape version . . . each node represents a unique feature and the color of the node denotes the associated chemical class. the thickness of the edge (connectivity) indicates the ms similarity (cosine score) among features. the m/z value of the feature is shown inside the node and is proportional to the size of the node. three selected clusters or connected features as relevant are shown. (b) inset of cluster denoting the presence of phosphocholine (pc)-containing lipids. significant differential abundant features among simulated diabetes (hg+hi) and control (ng) groups are indicated with an asterisk (p-value < . ). (c) characterization of features in (b) aided by substructure recognition by mslda software using ms visualization in www.ms lda.org. fragment at m/z . linked to a pc head group by mzcloud in silico prediction (www.mzcloud.org). abbreviations: m m, mass motif; fc, fold change; ng, normal glucose; hg, high glucose; hi, high insulin. chemical structures were drawn by chemdraw professional version . . . . figure . peptide metabolites modulated by simulated diabetes in bovine coronary artery endothelial cells (bcaec). (a) cluster retrieved from the main molecular network linked to glutathione and derivatives. the fragments of mass- -motif (m m)_ colored in red are .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / characteristic of a glutathione core and the fragments are shown in red. (b) features associated with m m_ using ms visualization in www.ms lda.org. (c) cluster retrieved from the main molecular network linked to phenylalanine-based metabolites. a singular node at m/z . is also shown. the fragments of m m_ colored in red are characteristic of a phenylalanine core (heuristic and quantum chemical predictions by www.mzcloud.org). (d) features associated with m m_ using ms visualization in www.ms lda.org. in gnps’s clusters (a and c), the node’s color denotes the chemical class assigned to the cluster. the thickness of the edge (connectivity) indicates the cosine score (ms similarity). the m/z value of the feature is shown inside the node and is proportional to the size of the node. significant differential abundant features among simulated diabetes (hg+hi) and control (ng) groups are indicated with an asterisk (p-value < . ). in ms lda’s nodes (b and d), the green node represents the m m and squares indicate individual features. edges represent connections to m m. significant differential abundant features among groups are indicated with an asterisk (p-value < . ). abbreviations: m m, mass motif; fc, fold change; ng, normal glucose; hg, high glucose; hi, high insulin. chemical structures were drawn by chemdraw professional version . . . . figure . simulated diabetes induced changes in the proteome of bovine coronary artery endothelial cells (bcaec). (a) principal component analysis (pca) of lc-swath-ms datasets. data was log transformed without scaling. shade areas depict the % confidence intervals. no scaling was used. (b) volcano plot of all quantified proteins (quantile normalization) displaying differences in relative abundance (> +/- % change, < . p-value cut-offs) between bcaec cultured in control (ng) media and simulated diabetes (hg+ hi) for twelve days. values (dots) represent the hg+hi/ng ratio for all proteins. red and blue dots denote downregulated and upregulated proteins in the hg + hi group vs. ng group, respectively. (c) heatmap of the top metabolites ranked by t-test. protein-protein interactome (> . confidence) using the list of proteins with increased abundance (d) and reduced abundance (e) in the hg + hi group. colored circles denote modules or clusters which may represent relevant complexes or functional units. the input proteins are illustrated with a blue shade and the gene id is also shown. the most representative pathway (containing more input proteins) for all modules is indicated in blue letters. abbreviations: ng, normal glucose; hg, high glucose; hi, high insulin. figure . d integrative network of the proteomic and metabolomic perturbations caused by simulated diabetes in bovine coronary artery endothelial cells (bcaec). composite protein-metabolite network created by omicsnet using the up-regulated proteins (red nodes) and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / metabolites (magenta nodes) in the hg + hi group (simulated diabetes). interacting proteins (< . confidence) were retrieved from string database and are shown as gray nodes. abbreviations: ng, normal glucose; hg, high glucose; hi, high insulin. figure . increased cellular binucleation by simulated diabetes in bovine coronary artery endothelial cells (bcaec) and human coronary artery endothelial cells (hcaec). (a) representative immunofluorescence micrographs showing the localization of the von-willebrand factor (vwf, : , % bsa in pbs) in fixed and permeabilized cells. the nuclei were stained using the dye hoechst ( µg/ml in hbss). white arrows indicate binucleated cells. (b) quantification of binucleated cells in hcaec and bcaec under simulated diabetes (hg+hi) vs. control (ng) group. fluorescence images were taken in at least three random fields per condition using an evos® floid® cell imaging station with a fixed x air objective. image analysis was performed by imagej software (version . . ). abbreviations: ng, normal glucose; hg, high glucose; hi, high insulin. figure . summary illustration of study findings. cellular structures were created using servier medical art templates, which are licensed under a creative commons attribution . unported license; https://smart.servier.com. chemical structures were drawn by chemdraw professional version . . . . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supporting information table s . list of all the putatively annotated metabolites by ms spectral matching against gnps public spectral libraries. table s . list of putatively annotated (ms spectral matching) metabolites modulated by simulated diabetes. table s . list of all detected peptides by proteinpilot software using the metabolomics datasets. table s putative annotated proline-peptides altered by simulated diabetes in bovine coronary artery endothelial cells by proteinpilot software and manual inspection. table s . list of the detected peptides and proteins in all conditions for swath-based quantification. figure s . proteomics data normalization results using normalyzerde. (a) total intensity of raw data before normalization. (b) quantitative parameters of normalization algorithms (pooled intragroup coefficient of variation [pcv], median absolute deviation [pmad], estimate of variance [pev]). qualitative parameters of normalization algorithms; (c) box plots (d) ma plots, and (e) density plots. figure s . cellular confluence in control and experimental group. representative micrographs of bovine coronary artery endothelial cells (bcaec) cultured for days with . mmol/l glucose (control group) and mmol/l glucose+ nmol/l insulin (simulated diabetes or experimental group). images were taken using an evos® floid® cell imaging station with a fixed x air objective. abbreviations: ng, normal glucose; hg, high glucose; hi, high insulin. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / metabolites dda omics integration go analysis proteomexchange pxd `` swath-based proteomics `` untargeted metabolomics meoh/etoh ( : , v:v) insulin nmol/l glucose mmol/l days glucose . mmol/l control substructure annotation ry metabolite annotation cooh `` m/z m/z rt in te ns it y in te ns it y in te ns it y ms ms a b sc iex trip le to f lc-ms/ms triple®tof + a ( . %) total features detected: ( . %) ( . %) n= n= b c d n orm alized m etabolite abundance pc p c ng hg+hi qc hg+hi ng qc glycerophospholipids organooxygen compounds fatty acyls steroids and derivatives glycerolipids chemical class indoles and derivatives organonitrogen compounds coumarins and derivatives connectivity (cosine score) precursor ion m/z value carboxylic acids and derivatives benzene and substituted derivatives unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . * * pc( : / : ( z, z)) o p o oo n+ o o o o pc( : / : ( z)) o p o oo n+ o o o o cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . cluster cluster cluster a c ms lda . m/z * b m m_ _phosphocholine-based substructure m/z r el at iv e in te ns ity . oh p o oho n+ lo g f c . - . peptide metabolites c a . m/z m m_ _glutathione-based substructure h + . m/z oh h n oh o sh h n o h n oh o sh n h nh ho o o o h n oh o sh o h n oh o sh n h o . m/z . m/z . m/z m m_ _phenylalanine-based substructure . m/z . m/z . m/z . m/z nh oh oh nh . m/z . m/z . m/z . m/z [m+h]+ [m+h]+ ms lda m m_ b * . m/z d ms lda m m_ * . m/z . m/z * * . m/z . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . n h o oh o ho o nh glutamyl-phenylalanine [m+h]+ unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . * . m/z . m/z . m/z . m/z . m/z . m/z . m/z . m/z precursor ion . m /z precursor ion . m /z precursor ion . m /z [m+h]+ *[m+h]+ . . . unknown * * [ m+h]+ lo g f c - . . lo g f c - . . interacting protein seed/input protein connectivity e b n= n= a protein-protein interaction network symbology mitochondrial function psmd psma rps rpl rps mcm ppp r b ywhaq ube n prmt prdx prdx copg dna/rna metabolism aprt ddx fis mx h afv dhx cav apex dynll myh rdx ywhab gabarapl rpl a cox i uqcrc ctgf lamp cpsf b m pdia d c n orm alized protein abundance protein metabolism- - - - scores plot pc ( . %) p c ( . % ) hg+hi ng pc p c ng hg+hi hg+hi ng prdx txn aprt metabolites proteins prdx glutamate glutathione proline leucine tyrosine -aminoadipatekynurenine serine methionine threonine oat hg+hing nuclei vwf nuclei vwf hg+hing b a hcaec bcaec ≈ % ≈ % binucleation hg+hi binucleation translation nuclei up-regulated down-regulated angiogenesis or cell proliferation ctgf afabp cd cav- rps rps rpl psma psmd integrated analysis cox i uqcrc ndufb ndufa mitochondrial inner mas cavin caveolae dna and rna metabolism nh h n o oh nh o nh o oh tryptophan serotoninkynurenine catabolism nh h n ho n h o oh nh o ho o h n o oh glutamyl-phenylalanine phenylalanine-based metabolites phenylalanine o p o oo n+ o o h r pafah b deacylation lysopc lipids pc lipids oh p o oho n+ phosphocholine o p o oo n+ o o r r lypla degradation inflammation catabolism oxidative stress peptides nh o ho o ohornitine oat glutamic acid nh o oh o h n oh osh n hnh ho o o ox-glutathione glutathione-based metabolites prdx prdx txn ros glutathione proline mitochondria table . pathway enrichment analysis of up-regulated and down-regulated proteins in hg+hi group reactome database total hits fdr total hits fdr up-regulated down-regulated metabolism of rna . e- peptide chain elongation . e- metabolism of mrna . e- influenza infection . e- synthesis of dna . e- nonsense mediated decay independent of the exon junction complex . e- dna replication . e- influenza life cycle . e- dna replication pre-initiation . e- eukaryotic translation elongation . e- m/g transition . e- nonsense mediated decay enhanced by the exon junction complex . e- s phase . e- nonsense-mediated decay . e- g /s transition . e- influenza viral rna transcription and replication . e- assembly of the pre- replicative complex . e- viral mrna translation . e- metabolism of rna . e- eukaryotic translation termination . e- kegg database up-regulated down-regulated basal transcription factors . e- basal transcription factors . e- mismatch repair . e- nucleotide excision repair . e- snare interactions in vesicular transport . e- renal cell carcinoma . e- base excision repair . e- endometrial cancer . e- human papillomavirus infection . e- peroxisome . e- chemical carcinogenesis . e- nicotine addiction . e- hepatocellular carcinoma . e- ribosome biogenesis in eukaryotes . e- human t-cell leukemia virus infection . e- gap junction . e- chronic myeloid leukemia . e- herpes simplex virus infection . e- notch signaling pathway . e- glutamatergic synapse . e- table . integrative pathway enrichment analysis of up-regulated proteins and metabolites in hg+hi group reactome database total hits fdr kegg database total hits fdr metabolism of amino acids and derivatives . e- egfr tyrosine kinase inhibitor resistance . e- metabolism . e- glutathione metabolism . e- glutathione conjugation . e- alanine, aspartate and glutamate metabolism . e- phase ii conjugation . e- abc transporters . e- amino acid synthesis and interconversion (transamination) . e- cysteine and methionine metabolism . e- biological oxidations . e- pancreatic cancer . e- trna aminoacylation . e- drug metabolism - cytochrome p . e- glutathione synthesis and recycling . e- metabolism of xenobiotics by cytochrome p . e- sulfur amino acid metabolism . e- drug metabolism - other enzymes . e- tryptophan catabolism . e- mrna surveillance pathway . e- isolation of the buchnera aphidicola flagellum basal body from the buchnera membrane isolation of the buchnera aphidicola flagellum basal body from the buchnera membrane matthew j. schepers , james n. yelland , nancy a. moran *, david w. taylor , - * institute for cell and molecular biology, university of texas at austin, austin, tx, department of integrative biology, university of texas at austin, austin, tx, departmnet of molecular biosciences, university of texas at austin, austin, tx, center for systems and synthetic biology, university of texas at austin, austin, tx, livestrong cancer institute, dell medical school, austin, tx, *correspondence to: dtaylor@utexas.edu (d.w.t.); nancy.moran@austin.utexas.edu (n.a.m.) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract buchnera aphidicola is an intracellular bacterial symbiont of aphids and maintains a small genome of only kbps. buchnera is thought to maintain only genes relevant to the symbiosis with its aphid host. curiously, the buchnera genome contains gene clusters coding for flagellum basal body structural proteins and for flagellum type iii export machinery. these structures have been shown to be highly expressed and present in large numbers on buchnera cells. no recognizable pathogenicity factors or secreted proteins have been identified in the buchnera genome, and the relevance of this protein complex to the symbiosis is unknown. here, we show isolation of buchnera flagella from the cellular membrane of buchnera, confirming the enrichment of flagellum proteins relative to other proteins in the buchnera proteome. this will facilitate studies of the structure and function of the buchnera flagellum structure, and its role in this model symbiosis. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction buchnera aphidicola is an obligate endosymbiont of aphid species worldwide and is a model for bacterial genome reduction, maintaining one of the smallest genomes yet discovered, only kbps , . though buchnera has lost genes not essential for its symbiotic lifestyle , , it retains genes associated with amino acid biosynthesis, reflecting its participation in a nutritional symbiosis , , . though the exchange of amino acids and vitamins between the aphid host and buchnera has been well-documented , , , the molecular mechanism for how these metabolites cross buchnera membranes is unknown: buchnera maintains a small number of genes coding for membrane transport proteins, most of which are located at the inner membrane , . the permeability of the buchnera outer membrane remains a mystery, considering the paucity of annotated transporter genes in sequenced buchnera genomes. genes coding for proteins localizing to the outer membrane of buchnera include small β-barrel aquaporins, which allow passive diffusion of small molecules, and flagellum basal body components , , . investigation into protein expression by these symbiotic partners has shown that flagellum basal body components are highly expressed by buchnera . indeed, transmission electron microscopy images of buchnera reveal flagellum basal bodies studded all over the bacterial outer membrane . despite its abundance on the buchnera cell surface, the role of this protein complex for maintaining the aphid-buchnera symbiosis is unknown . buchnera of the pea aphid (acyrthosiphon pisum) maintains genes coding for flagellum proteins in three discrete clusters. the maintained genes code for the structural proteins required for formation of a flagellum basal body, a partial flagellar hook, as well as the type iii cytoplasmic export proteins. buchnera lineages vary in the set of flagellum genes retained (supplementary table ), but all have lost genes encoding the flagellin and motor proteins , indicating a functional shift away from cell motility. the bacterial flagellum structure is an evolutionary homologue to the injectisome (type iii secretion system, or t ss), a macromolecular protein complex used to (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . deliver secrete effector proteins, often to a eukaryotic host , , . flagellum assembly occurs in a stepwise, sequential manner beginning from the bacterial cytoplasm, identical to the t ss , , . buchnera maintains genes coding for the proteins required for a functional t ss , , as shown in studies of yersinia , and salmonella , . gram-negative bacteria have also been shown to export proteins through a flagellum basal body , , . the bacterial flagellum could be repurposed to serve a novel function for the aphid-buchnera symbiosis. the basal body could serve as a type iii protein exporter to secrete proteins to signal to the aphid host or as an surface signal molecule for host recognition during infection of new aphid embryos. here, we present a procedure for isolation of flagellum basal body complexes adapted for an endosymbiont , allowing for removal of these structures directly from buchnera and enrichment of flagellum basal body complexes after isolation. this procedure will enable further characterization of the basal bodies and their modifications for a role in symbiosis. results isolation of hook basal bodies from buchnera purification of the complex was initially assessed at multiple timepoints along the procedure. samples were taken of initial buchnera cell lysate, lysate after raising the ph to , protein suspension after the first g spin, the third g spin, and finally after the , g spin and overnight incubation in tet buffer. sds-page showed sixteen bands were present after the staining procedure and their sizes corresponded to those of constituent proteins of the buchnera flagellum basal body (supplemental figure ). protein samples were extracted from the gel and subjected to mass spectrometry analysis. mass spectrometry analysis of isolated basal bodies protein id lc-ms/ms spectral counts were provided by the university of texas at austin proteomics core facility. we compared our samples to proteomic datasets from homogenized whole aphids, and from bacteriocytes purified from pea aphids . buchnera flagellum proteins (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . were highly enriched by our isolation procedure, especially flif, flgi, flge, flha, and flgf (figure .). these results indicate that all but two flagellum proteins present in the mass spectrometry samples were enriched during the isolation procedure: structural proteins file, flif, flgi, flge, flgf, and flgh were enriched threefold or more from the start to the finish of the procedure. flgb, flgc, flgg, flig, flih, and flii were enriched, though not to the extent of the other structural proteins. type iii secretion proteins flha and flip were shown to be enriched by this procedure (figure ., supplemental figure .). the widespread enrichment of buchnera flagellum proteins indicates that our adapted procedure for isolating macromolecular protein complexes from the membranes of endosymbiotic bacteria was successful. only flagellum proteins flgk and flin were reduced by the isolation procedure, perhaps because of their localization to the periphery of the flagellum. basal bodies resemble top hats via electron microscopy we analyzed the isolated basal bodies by negative stain electron microscopy. while raw micrographs showed heterogenous particles, likely due to disassembly of the complex, detergent micelles, and contaminating proteins, there were several particles that appeared regularly. these single particles resembled a top hat with both rod and ring-shaped features (figure ), similar in size and shape to those observed in tem images of whole buchnera cells . discussion here, we demonstrate a procedure for isolating macromolecular protein complexes from buchnera aphidicola, an obligate endosymbiotic bacterium that cannot be cultured or genetically manipulated. identifying the changes in these complexes could elucidate how buchnera’s adaptation over millions of years to a mutualistic lifestyle has affected its proteome. as buchnera is not motile and is confined to host-derived “symbiosomal” vesicles inside bacteriocytes , , the retention and expression of these partial flagella indicates that they have become repurposed. these complexes have previously been hypothesized to be acting as type (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . iii secretion systems for provisioning peptides or signal factors to the aphid host . indeed, the proteins retained in the buchnera flagellum constitute the structural proteins and machinery required for a functional type iii secretion system . transcriptome analyses of pea aphid lines with different buchnera titers reveal differences in expression of flagellar genes . in aphid lines that harbor relatively low numbers of buchnera, the endosymbionts have elevated relative expression of mrna associated with flagellar secretion genes (flip, fliq,and flir), while buchnera in aphid lines with high buchnera numbers had elevated expression of genes for flagellum structural proteins though heavily expressed in buchnera of pea aphids, components of the flagellum basal body are not maintained equally among lineages of buchnera of different aphid species based on available genomic sequences (supplementary table ). genes coding for proteins associated with type iii secretion activity (flha, flhb, flip, fliq, and flir) and basal body structural proteins (flie, flif, flgb, flgc, flgf, flgg, and flgh) are well maintained across buchnera lineages, but genes coding for hook proteins (flgd, flge, and flgk) and the flagellum-specific atpase (flii) are frequently shed. a more extreme example is the buchnera strain harbored by aphids of genus stegophylla: having the smallest sequenced buchnera genome discovered thus far ( kbps), these buchnera have completely lost genes associated with flagellum structure and type iii secretion activity. in all but the most extreme examples, the buchnera flagellum is well maintained, pointing to a continuing role for this complex for this ancient symbiosis. buchnera’s tiny genome contains no known pathogenicity proteins or proteins previously associated with type iii export , . potentially, buchnera flagellum basal bodies may instead serve as surface signals for recognition by the host. vertical transfer of buchnera from mother to daughter aphids shows naked buchnera cells being exocytosed from maternal bacteriocytes and moving in aphid haemolymph to infect a nearby specialized syncytial cell of stage embryos . the purpose of the flagellum in the context of buchnera’s symbiotic lifestyle remains unknown. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . further inquiry into this protein complex could reveal how the repurposing of a motility organelle facilitates this ancient and obligate symbiosis. methods buchnera extraction from aphids pea aphids (acrythosiphon pisum strain lsr ) were placed as all-female clones on fava bean (vicia faba) seedlings on h/ h light/dark cycles at ºc. once reaching adulthood, apterous adults were raised on fava bean plants on h/ h light cycles and allowed to reproduce. after seven days, all aphids (fourth-instar larvae, typically amounting to g) were removed from the fava bean plants. aphids were weighed and surface-sterilized in . % bleach solution, then rinsed twice in ultrapure water (milliporesigma), each seconds. aphids were gently ground in a mortar and pestle in ml sterile buffer a ( mm kcl (sigma-aldrich), mm tris base (sigma- aldrich), mm mgcl (sigma-aldrich), mm anhydrous edta (sigma-aldrich), and mm sucrose (sigma-aldrich) at ph . ). aphid homogenate was vacuum filtered to μm, then centrifuged at g for minutes at c. supernatant was discarded, and the resulting pellet was resuspended in ml buffer a and vacuum-filtered three times from μm, to μm, and finally to μm. the resulting filtrate was spun at g for m at c and supernatant discarded. the resulting pellet was resuspended in ml sucrose solution ( mm sucrose (sigma-aldrich) and mm tris base (sigma-aldrich) then checked on a brightfield microscope for intact buchnera cells. buchnera cells remain alive while at c for a maximum of h. isolation of flagellum basal bodies from buchnera cells buchnera was incubated with gentle spinning on ice with egg white lysozyme ( . mg/ml, sigma-aldrich) for m. mm anhydrous edta solution, ph . (sigma-aldrich) was added to final concentration mm. the pellet was taken off ice, and gradually raised to room temperature with gentle spinning for m. triton x- (acros organics) was added to % w/v, along with mg/ml rnase-free dnase i (bovine pancreas, sigma-alrich) and allowed to stir for / hour. after incubation, cell lysate was kept at c or on ice until use. the lysate was raised to ph (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . using n naoh (macron fine chemicals) to attempt to denature host and bacterial cytoplasmic proteins. the solution was spun at g for m at c three times, each time decanting the supernatant to a new tube. after three spins, the supernatant was transferred to a nalgene oak ridge polyallomer centrifuge tube (thermo-fisher) and spun at , g for h at c. supernatant was gently decanted and pellet covered with tet buffer ( mm tris-hcl, mm edta, . % x- , ph . ) and left overnight at c to soften and dissolve. submission of protein for mass spectrometry solubilized protein concentration was determined using an eppendorf biophotometer. . mg protein was run on premade - % tris-glycine sds-page gels (thermo-fisher) at v for m. gels were stained in coomassie brilliant blue (bio-rad) for m, then destained in % acetic acid (thermo-fisher) for m. gel bands corresponding to the step in the procedure sampled (“lysate,” “ph ,” “spin ,” “spin ,” “final”) were cut out and submitted to the university of texas at austin cbrs biological mass spectrometry facility for lc-ms/ms using a dionex ultimate rslcnano lc coupled to a thermo orbitrap fusion (thermo-fisher). samples were submitted in ml destain with buchnera aphidicola str. aps provided as the reference organism (asm v ). prior to hplc separation, peptides were desalted using millipore u-c ziptip pipette tips (millipore-sigma). a cm long x μm id c trap column was followed by a cm long x μm analytical columns packed with c μm material (thermo acclaim pepmap , thermo-fisher) running a gradient from - %. the ft-ms resolution was set to , , with an ms/ms cycle time of seconds and acquisition in hcd ion trap mode. raw data was processed using sequest ht embedded in proteome discoverer (thermo-fisher). scaffold (proteome software) was used for validation of peptide and protein ids. em and data collection protein from the final step of this procedure was stained using % uranyl acetate on a -mesh continuous carbon grid. images were acquired using an fei talos transmission (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . electron microscope operating at kv, with . second exposures, a dose rate of e-Å- , and a nominal magnification of , x. whole aphid proteomic samples for controls, proteomes were profiled for whole aphids, including both buchnera and aphid cells. aphids were mixed-aged populations grown at ºc in cup cages and pooled into three replicate samples. aphids were washed and homogenized in buffer as described above. the homogenate was centrifuged at g for min at ºc, supernatant was removed, and pellet was suspended with % sds, . m tris-hcl, . m dtt at oc for min, then centrifuged at , g for min at c to remove non-soluble material after adding same volume of m urea. protein concentration was determined on an eppendorf biophotometer. mg total protein was run on a bis-tris gel for less than cm, and the band was excised and and sent to the ut proteomics core for lc-ms/ms protein id. protein id methods were identical as detailed above. author contributions m.j.s. raised aphids and prepared buchnera protein extracts. j.n.y. performed electron microscopy. n.a.m. and d.w.t. analyzed data and supervised and secured funding for this work. all authors reviewed the final manuscript. acknowledgements we thank eric verbeke, jack bravo, and evan schwartz for their advice and ideas for isolating and imaging proteins from native cells; julie perreau, margaret steele, and serena zhao for creating a space in which ideas and techniques could be shared freely; kim hammond for help with aphid raising and organization. this work was supported by the national science foundation (to n.a.m), a welch foundation grant f- (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (to d.w.t.), national institute of general medical sciences (nigms) of the national institutes of health (nih) r gm (to d.w.t.), army research office grant w nf- - - (to d.w.t.), and a robert j. kleberg, jr. and helen c. kleberg foundation medical research award (to d.w.t.). d.w.t is a cprit scholar supported by the cancer prevention and research institute of texas (rr ) and an army young investigator supported by the army research office (w nf- - - ). competing interests the authors declare no competing interests. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figures figure : barplot showing flagellum protein enrichment before (lysate) and after (final) the isolation procedure compared to proteomic datasets generated with whole aphids and dissected bacteriocytes. blue indicates “core” proteins required for secretion activity and red indicates accessory proteins maintained by buchnera aphidicola in pea aphids. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure : cartoon diagram of the reduced buchnera aphidicola (pea aphid) flagellum. colors indicate enrichment status of individual proteins at the final step of the procedure, corresponding to figure . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure : single particles of buchnera flagellum complexes after the isolation procedure. scale bars represent nm. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary figure : silver stained sds gel created after the enrichment procedure was performed. the first lane is taken directly from the enrichment preparation after overnight incubation with tet buffer. the second lane is after concentrating the enriched proteins to mg/ml. the third lane is concentrated protein diluted to . mg/ml. ladder values represent molecular weight in kda. symbols correspond to flagellar protein molecular weight: * corresponds to flha ( kda). † corresponds to flif ( kda) and flgk ( kda). º corresponds to flge ( kda), flip ( kda), and flgi ( kda). ‡ corresponds to flig ( kda) and flim ( kda). ∆ corresponds to flgg ( kda), flgf ( kda), flgh ( kda), and flih ( kda). Ø corresponds to flgb ( kda), flin ( kda), flgc ( kda), and flie ( kda). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary figure : dotplot of buchnera aphidicola flagellum proteins found after lc/ms- ms analysis. the enrichment score for each protein is indicated on the x axis. enrichment scores are calculated by dividing unique spectral counts for each protein in the final step by each protein present in the cell lysate. core flagellum proteins (defined by proteins required for type iii secretion activity and flagellum structure) are filled in green, accessory proteins are filled in white. flgb flgk flie flin flip flii flig flgc flha flim flih flge flgi flif flgh flgf flgg enrichment score after isolation procedure p ro te in type accessory core (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references . munson, m.a., baumann, p., kinsey, m.g. buchnera gen. nov. and buchnera aphidicola sp. nov., a taxon consisting of the mycetocyte-associated, primary endosymbionts of aphids. int. j. syst. bacteriol. ; : - . shigenobu, s., watanabe, h., hattori, m., sakaki, y. & ishikawa, h. genome sequence of the endocellular bacterial symbiont of aphids buchnera sp. aps. nature , - ( ). . moran, n. a. & bennett, g. m. the tiniest tiny genomes. annu rev microbiol , - ( ). . tamas, i. et al. million years of genomic stasis in endosymbiotic bacteria. science , - ( ). . wernegreen, j. j. genome evolution in bacterial endosymbionts of insects. nature reviews genetics , - ( ). . douglas, a. e. nutritional interactions in insect-microbial symbioses: aphids and their symbiotic bacteria buchnera. annu rev entomol , - ( ). . akman gündüz, e. & douglas, a. e. symbiotic bacteria enable insect to use a nutritionally inadequate diet. proc biol sci , - ( ). . nakabachi, a. & ishikawa, h. provision of riboflavin to the host aphid, acyrthosiphon pisum, by endosymbiotic bacteria, buchnera. j insect physiol , - ( ). . charles, h., calevro, f., vinuelas, j., fayard, j. m. & rahbe, y. codon usage bias and trna over-expression in buchnera aphidicola after aromatic amino acid nutritional stress on its host acyrthosiphon pisum. nucleic acids res , - ( ). . charles, h. et al. a genomic reappraisal of symbiotic function in the aphid/buchnera symbiosis: reduced transporter sets and variable membrane organisations. plos one , e ( ). . poliakov, a. et al. large-scale label-free quantitative proteomics of the pea aphid- buchnera symbiosis. mol cell proteomics , m . ( ). . maezawa, k. et al. hundreds of flagellar basal bodies cover the cell surface of the endosymbiotic bacterium buchnera aphidicola sp. strain aps. j bacteriol , - ( ). . denise, r., abby, s. s. & rocha, e. p. c. the evolution of protein secretion systems by co-option and tinkering of cellular machineries. trends in microbiology ( ). . chong, r. a., park, h. & moran, n. a. genome evolution of the obligate endosymbiont buchnera aphidicola. mol biol evol ( ). . cornelis, g. r. & van gijsegem, f. assembly and function of type iii secretory systems. annu rev microbiol , - ( ). . moya, a., peretó, j., gil, r. & latorre, a. learning how to live together: genomic insights into prokaryote-animal symbioses. nat rev genet , - ( ). . abby, s. s. & rocha, e. p. the non-flagellar type iii secretion system evolved from the bacterial flagellum and diversified into host-cell adapted systems. plos genet , e ( ). . marlovits, t. c. et al. structural insights into the assembly of the type iii secretion needle complex. science , - ( ). . liu, r. & ochman, h. stepwise formation of the bacterial flagellar system. proc natl acad sci u s a , - ( ). . ince, d., sutterwala, f. s. & yahr, t. l. secretion of flagellar proteins by the pseudomonas aeruginosa type iii secretion-injectisome system. journal of bacteriology , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . young, g. m., schmiel, d. h. & miller, v. l. a new pathway for the secretion of virulence factors by bacteria: the flagellar export apparatus functions as a protein-secretion system. proc natl acad sci u s a , - ( ). . irikura, v. m., kihara, m., yamaguchi, s., sockett, h. & macnab, r. m. salmonella typhimurium flig and flin mutations causing defects in assembly, rotation, and switching of the flagellar motor. j bacteriol , - ( ). . minamino, t. & macnab, r. m. components of the salmonella flagellar export apparatus and classification of export substrates. j bacteriol , - ( ). . konkel, m. e. et al. secretion of virulence proteins from campylobacter jejuni is dependent on a functional flagellar export apparatus. j bacteriol , - ( ). . scanlan, e., yu, l., maskell, d., choudhary, j. & grant, a. a quantitative proteomic screen of the campylobacter jejuni flagellar-dependent secretome. j proteomics , - ( ). . lópez-sánchez, m. j. et al. evolutionary convergence and nitrogen metabolism in blattabacterium strain bge, primary endosymbiont of the cockroach blattella germanica. plos genet , e ( ). . tegunov, d. & cramer, p. real-time cryo-electron microscopy data preprocessing with warp. nat methods , - ( ). . braendle, c. et al. developmental origin and evolution of bacteriocytes in the aphid- buchnera symbiosis. plos biol , e ( ). . miura, t. et al. a comparison of parthenogenetic and sexual embryogenesis of the pea aphid acyrthosiphon pisum (hemiptera: aphidoidea). j exp zool b mol dev evol , - ( ). . smith, t. e. & moran, n. a. coordination of host and symbiont gene expression reveals a metabolic tug-of-war between aphids and buchnera. proc natl acad sci u s a , - ( ). . shimomura, s., shigenobu, s., morioka, m. & ishikawa, h. an experimental validation of orphan genes of buchnera, a symbiont of aphids. biochem biophys res commun , - ( ). . koga, r., meng, x. y., tsuchida, t. & fukatsu, t. cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte-embryo interface. proc natl acad sci u s a , e - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a high content lipidomics method using scheduled mrm with variable retention time window and relative dwell time weightage a high content lipidomics method using scheduled mrm with variable retention time window and relative dwell time weightage akash kumar bhaskar , , salwa naushin , , arjun ray , shalini pradhan , khushboo adlakha , towfida jahan siddiqua , , dipankar malakar , shantanu sengupta , * csir-institute of genomics and integrative biology, mathura road, new delhi- , india academy of scientific and innovative research (acsir), ghaziabad- , india department of computational biology, indraprastha institute of information technology, okhla, new delhi- , india nutrition and clinical services division, international centre for diarrheal disease research, dhaka- , bangladesh sciex, , udyog vihar, phase iv, gurgaon- , haryana, india. *addresses for correspondence: shantanu sengupta csir-institute of genomics and integrative biology, mathura road, delhi - email: shantanus@igib.res.in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:shantanus@igib.res.in https://doi.org/ . / . . . abstract: lipids are highly diverse group of biomolecules that play a pivotal role in biological processes. lipid compositions of bio-fluids are complex, reflecting a wide range of concentration of different lipid classes with structural diversity within lipid species. varying degrees of chemical complexity makes their identification and quantification challenging. newer methods are thus, highly desired for comprehensive analysis of lipid species including identification of structural isomers. herein, we propose a targeted- mrm method for large scale high-throughput lipidomics analysis using a combination of variable retention time window (variable-rtw) and relative dwell time weightage (relative-dtw) for different lipid species. with this method, we were able to detect more than lipid species (encompassing lipid classes), including different structural isomers of triglyceride, diglyceride, and phospholipids, in a single-run of minutes. the limit of detection varied between . pmol/l and nmol/l for different lipid classes with fmol/l being lowest for phosphatidylethanolamine while it was highest for diacylglycerol ( nmol/l). similarly, the limit of quantitation varied from fmol/l to nmol/l. the recovery of the method is in the acceptable range and the of lipid species were found to have a coefficient of variance (cv) < %. using this method we demonstrate that lipids with ω- and ω- fatty acid chains are altered in individuals with vitamin b deficiency. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction: lipid constitutes highly diverse biomolecules which play important role in the normal functioning of the body, maintaining the cellular homeostasis, cell signaling and energy storage - . dysregulation of lipid homeostasis is associated with a large number of pathologies such as obesity and diabetes , , cardiovascular disease , cancer etc. , . lipid compositions of bio-fluids are complex, reflecting a wide range of concentrations of different lipid classes and structural diversity within lipid species , . although the exact number of distinct lipids present in cells is not exactly known, it is believed that the cellular lipidome consists of more than different lipid species each with several structural isomers , - . identification of lipids using traditional methods like thin layer chromatography, gas chromatography, etc. are limited by their lower sensitivity and accuracy and hence is not suitable for comprehensive lipidomics studies , . recent advances in mass spectrometry (ms) based lipidomics has enabled accurate identification of a large number of lipid species from various biological sources , . analysis of lipids in both positive and negative ion modes in a single mass spectrometric scan using untargeted or targeted approach have been used for greater coverage , . the untargeted lipidomics approach however has some major challenges especially with respect to identification and characterization of the lipid species, time required to process large quantity of raw data and the bias towards the detection of lipids with high- abundance , . these problems are greatly reduced in a targeted approach using multiple reaction monitoring (mrm), since defined groups of chemically characterized and annotated lipid species are analyzed , . the use of mrm enables simultaneous identification of around hundred lipid species, including those with low abundance , . the number of lipid species identified could be further increased by using scheduled mrm, where the mrm transitions are monitored only around the expected retention time of the eluting lipid species , , . this enables monitoring of greater number of mrm transitions in a single mass spectrometric acquisition. using scheduled mrm, takeda et al and other groups, were able to identify/ quantify hundreds of lipid species including isomers of phospholipids (pls) and diacylglycerol (dag) in a single targeted scan , , . however, identification of triacylglycerols (tag) was based on pseudotransitions, as identifying different species of tag is challenging , , . the retention time window chosen in a scheduled mrm is usually of a fixed width. however, as the retention time window width varies for each lipid species, a variable window width for each lipid species could reduce the time necessary to develop high throughput targeted methods. there are a few reports where variable retention time window (dynamic mrm) has been used in various applications, including identifying lipids of a specific class - . however, none of these studies involved comprehensive lipidome analysis. further, in these studies, the dwell time for each peak was automatically fixed on the basis of the rt window width chosen. the quality of peaks can be improved by varying dwell time weightage for each transition without (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . compromising with the cycle time. assigning, a low dwell time weightage to high abundant compounds and high dwell time weightage to less abundant compounds, irrespective of the elution window, could help in accommodating large number of transitions in single run with improved data quality. here we report a rapid and sensitive targeted lipidomics method using scheduled mrm with variable-retention time width and relative dwell time weightage, that enabled the identification of more than lipid species including isomers of triglycerides, diglycerides, and phospholipids in a single mass spectrometric scan of minutes. as a demonstration of the applicability of this method we show that vitamin b deficiency, leads to alteration of various lipid species which could explain the association of vitamin b deficiency with cardio-metabolic diseases previously reported in various studies. to the best of our knowledge this is the largest number of lipid species identified till date in a single experiment. materials and methods chemicals and reagents ms-grade acetonitrile, methanol, water, -propanol (ipa) and hplc-grade dichloromethane (dcm), were purchased from biosolve (dieuze, france); ammonium acetate and ethanol were obtained from merck (merck & co. inc., kenilworth, nj, usa). lipid internal standards used in the study : sm (d : - : (d )), tag ( : - : (d )- : ), dag ( : - : (d )), lpc ( : (d )), pc ( : - : (d )), lpe ( : (d )), pe ( : - : (d )), pg ( : - : (d )), pi ( : - : (d )), ps ( : - : (d ), pa ( : - : (d )) in the form of splash mix and ceramide ( : ) were purchased from avanti polar (alabaster, alabama, usa). lipid extraction from human plasma we used a modified bligh and dyer method using dichloromethane/methanol/water ( : : v/v). the study was approved by institutional ethical committee of csir-igib. human plasma ( μl) was mixed with μl of water (in glass tube) and incubated on ice for minutes. lipid internal standard mixes ( µl, consisting of splashmix and ceramide) was added to a mixture of methanol ( ml) and dichloromethane ( ml); the mixture was vortexed and allowed to incubate for minutes at room temperature. after incubation, μl water and ml dichloromethane was added to the solution and vortexed for seconds. the mixture was centrifuged at g for minutes when there was a phase separation. the lower organic layer was collected into a fresh glass tube. ml dichloromethane was added to remaining mixture in extraction tube and centrifuged again to collect the lower layer. the previous step was repeated one more time. solvent was evaporated in vacuum dryer at °c and the lipids were resuspended in μl of ethanol; vortexed for minutes, sonicated for minutes and again vortexed for minutes. the suspension was transferred to polypropylene auto sampler vials and subjected to lc-ms run. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . liquid chromatography-mass spectrometry: we used an exion lc system with a waters aquity uplc beh hilic xbridge amide column ( . µm, . x mm) for chromatographic separation. the oven temperature was set at °c and the auto sampler was set at °c. lipids were separated using buffer a ( % acetonitrile with mm ammonium acetate, ph- . ) and buffer b ( % acetonitrile with mm ammonium acetate, ph- . ) with following gradient: with a flow rate of . ml/minute, buffer b was increased from . % to % in minutes, increased to % buffer b in next minutes. in the next minute buffer b was ramped up to %, further increased to % in the next minutes, and held at the same concentration and flow rate for seconds. flow rate was increased from . ml/min to . ml/min and % buffer b was maintained for the next . minutes. buffer b was brought to initial . % concentration in . minute and column was equilibrated at the same concentration and flow for . minutes before flow rate was brought to initial . ml/minute in next seconds and maintained at the same till the end of minutes gradient. additionally the separation system was equilibrated for minute for subsequent runs. sciex qtrap + lc/ms/ms system in low mass range, turbo source with electrospray ionization (esi) probe was used with the following parameters; curtain gas (cur): psi, temperature (tem): degree, source gas (gs ): and source gas (gs ): psi, ionization voltage (is): for positive mode and is: - for negative mode, target scan time: . sec, scan speed: da/s, settling time: . msec and mr pause: . msec. acquisition was done using analyst . . software. method development: for identification and relative quantification of all the lipid species, theoretical mrm library were generated using lipidmaps (https://www.lipidmaps.org/). using internal standards from different lipid classes, the mrm parameters (collision energy, declusturing potential, cell exit potential, and entrance potential) were optimized for lipid species which belonged to lipid classes - sphingomyelin (sm), ceramide (cer), cholesterol ester (ce), monoacylglycerol (mag), diacylglycerol (dag), triacylglycerol (tag), lysophosphatidic acid (lpa), phosphotidic acid (pa), lysophosphatidylcholine (lpc), phosphatidylcholine (pc), lysophosphatidylethanolamine (lpe), phosphatidylethanolamine (pe), lysophosphatidylinositol (lpi), phosphatidylinositol (pi), lysophosphatidylglycerol (lpg), phosphatidylglycerol (pg), lysophosphatidylserine, and (lps), phosphatidylserine (ps) (supplementary table- ). the mrm library consisted of transitions including internal standards, of which species were identified in positive mode (sm, ce, cer, tag, dag, mag) and identified in negative mode (phospholipids and lysophospholipids). the current mrm panel covers major lipid classes and categories having fatty acids with - carbons and - double bonds per fatty acyl chain. transitions were distributed into multiple unscheduled mrm method and the relative retention time of each transition was determined with respect to their respective internal standards through amide-hilic column. furthermore, the retention time validation was done by performing ms/ms (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . experiment using information dependent acquisition (ida) with enhanced product ion scan (epi) of specific ions in unscheduled mrm for each lipid class. ms/ms analysis in epi mode was based on the conventional triple quadrupole ion path property of an ion- trap for the third quadrupole. the basic parameters were kept the same as mentioned in mrm experiment. ms/ms spectra were compared with ms/ms information from lipid maps (http://www.lipidmaps.org/) to verify the structures of the putative lipid species and predicting the structure from ms/ms spectra based on specific cleavage rules for lipids. retention time window and dwell time weightage using smrm builder (https:// https://sciex.com/), an excel based tool from sciex, the variable retention time window and variable dwell time weightage for all transitions were optimized. the principle on which the tool works is based on the width and intensity of the chromatographic peak. with variable retention time window width, each mrm transition can have its own rt window. wider windows are assigned to analytes that show higher run to run variation or have broader peak widths. variable dwell times were assigned to improve the signal to noise ratio of mrm transitions based on the abundance of the analyte in the sample- higher dwell time weightage assigned for analytes with low abundance (supplementary table ). dwell time for each species were assigned based on this weight which maintains the cycle time and optimizes the signal to noise ratio for low abundant peaks. detailed for optimized parameters is given in supplementary table . limit of detection and quantitation: the limits of detection and quantitation were derived from peak area of known amounts of lipid internal standards added to lipid extract from human plasma (matrix): the master mix of lipid internal standards was prepared from splashmix and ceramide ( : ) having following concentrations: sm ( . nmol), cer ( . nmol), tag ( . nmol), dag ( . nmol), lpc ( . nmol), pc ( . μmol), lpe ( . nmol), pe ( . nmol), pg ( . nmol), pi ( . nmol), ps ( . nmol), pa ( . nmol). limit of blank- was defined as the average (based on triplicate experiments) signal found only in matrix (without internal standards; blank). lob was calculated using mean and standard deviation from plasma matrix: lob = mean blank + . (sd blank) the raw analytical signal obtained for standards from plasma lipid extract (spiked with standards) was used to estimate the lod and loq, using the following formula: lod = mean blank + (standard deviation blank) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://www.lipidmaps.org/ https://doi.org/ . / . . . loq = mean blank + (standard deviation blank) the standard solution was diluted serially with matrix and the lipid standards were run in the following concentration ranges: . fmol- . nmol for sm, . fmol- . nmol for cer, . fmol- . nmol for tag, . fmol- . nmol for dag, . fmol- . nmol for lpc, . pmol- . μmol for pc, . fmol- . nmol for lpe, . fmol- . nmol for pe, . fmol- . nmol for pg, . fmol- . nmol for pi, . fmol- . nmol for ps, . fmol- . nmol for pa. the lowest concentration which has signal more than the estimated method limits (based on above formula) was considered as lod and loq. the mean and standard deviation was calculated from replicates. linearity was represented by r , where loq was taken as the lowest calibrator concentration for each lipid standards. spike and recovery and coefficient of variance: extraction recovery for the method was measured by comparing the peak area of matrix extract spiked with standards before and after extraction. for this, ul of lipid internal standard mix (standard mix: lipid extract resuspension volume :: : v/v) was used. the percentage recovery and relative standard deviation was calculated from biological replicates. relative recovery = mean area of extracted sample with spiked standard before extraction/ mean area of extracted sample with spiked standard after extraction %relative standard deviation = standard deviation /mean analytical signal × coefficient of variance (cv) of the method was determined by observing individual lipid species variation within batch. the intra-batch variation was assessed by analyzing technical replicates of lipids extracted from plasma. cv values were only calculated for those lipid species which has carry over less than % and present in at least replicates . inter day variability for each lipid species was determined by analyzing lipids on different days from a stock of pooled plasma. the cv values were reported for different days (n= , technical replicates) after sum-normalization within lipid class. percentage cv = standard deviation/average intensity × alteration of plasma lipids due to vitamin b deficiency: study population: the study (which was a part of a larger study), was designed to identify plasma lipids that were altered due to vitamin b deficiency. apparently healthy individuals were classified in two groups based on their plasma vitamin b levels. an informed consent was obtained from the participants. the study was approved by institutional ethical committee of csir-igib. individuals with vitamin b values less than pg/ml, were considered to be vitamin b deficient and those with levels between - pg/ml were considered be in the normal range. lipids from plasma were extracted as described above. for this study, plasma of individuals ( with b deficiency and with normal plasma vitamin b levels) were used. lipids (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . that had a cv< % and that were altered by more than . folds with p< . were considered to be significantly altered between the two groups. data analysis: the .wiff files for relative quantitation were processed in multiquant . . and for the identification of different lipid species; ms/ms spectrum matching with the structure of putative lipid species using .mol file was done using peakview . . . statistical analysis was done using excel. figures were drawn using matlab (matlab, . version . . (r a), natick, massachusetts: the mathworks inc.), raw graph (https://rawgraphs.io) and graphpad prism version . . results: we developed a scheduled-mrm method that can identify more than lipid species in a single mass spectrometric acquisition using a combination of variable-rtw and relative-dtw for each lipid species along with an optimized lc-gradient. initially, we generated a theoretical mrm library using lipidmaps (http://www.lipidmaps.org/) which consisted of lipid species and internal standards, belonging to the lipid classes. the total ion chromatogram is shown in figure a. the classes of lipids were analyzed in the positive or negative ion modes. in the positive ion mode, the m+h precursor ions were used for sm, cer, ce, while for neutral lipids (tag, dag, and mag) [m+nh ] precursor ions were considered. phospholipids (pl's) were identified in negative ion mode, forming [m-h] precursor ion except lpc’s and pc’s, for which [m+ch coo]- were considered. the variable-rtw and relative-dtw for different species was determined based on the intensity and width of the peaks obtained for each lipid species. for instance, in positive ion mode, sm ( : ) had a broader elution window ( . seconds) compared to ce ( : ) ( . seconds), but the signal intensity of ce ( : ) was lower as compared to sm ( : ). thus, to collect sufficient number of data points, higher dwell time weight of . was applied for ce ( : ) as compared to . for sm ( : ) (figure b and c). furthermore, lpc ( : ) and lpe ( : ), had the same elution window of . seconds but a dwell time weightage of was applied for lpc ( : ) as compared to . for lpe ( : ; figure d and e). a complete list of all parameters for each lipid species along with retention window and dwell weightage is given in supplementary table . identification of isomers within lipid classes in an attempt to identify different lipid isomers, we used customized-approaches for various lipid classes. for tags, instead of using pseudo-transitions, we identified different tag species on the basis of sn-position by selecting a unique parent ion/ daughter ion (q /q ) combination, which is based on neutral loss of one of the sn- position fatty acyl chain (rcooh) and nh from parent ion [m+nh ]+. for instance, the parent ion (q ) for tag : is . while the product ion (q ) was derived from the remaining mass of tag after loss of fatty acid present at one of the sn-position like m/z (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint http://www.lipidmaps.org/ https://doi.org/ . / . . . . for tag ( : /fa : ) as shown in figure . using this approach we found tag species for tag ( : ) based on composition of fatty acid present at one of the sn- position (figure a). furthermore, ms/ms through epi scan confirmed six of the isomers of tag : unambiguously (supplementary figure ). the mrm library used, consists of tag species which belongs to different categories of tag based on total chain length and unsaturation. further validation of q in ms/ms experiment through ida-epi scan confirmed the structural characterization of q ion with ms/ms spectrum for putative tag species. using this method, we were able to identify total of tag species from different categories of tag (figure a). among these tag’s, we found tag ( : ) was the most abundant form in human plasma (figure b and supplementary table ). we identified isomers of tag ( : ) among which tag ( : /fa : ) was the most abundant in human plasma (figure b and supplementary table ). for phospholipids (pc, pe, pg, ps, pi, and pa), instead of the conventional method of using the head group loss in positive ion mode (e.g.: pc- : , . / . ), we used a modified approach using negative ion mode via the loss of fatty acid to identify the phospholipids at the fatty acid composition level. using this approach, we were able to identify isomers of phospholipids within a class, like pc : ‒ : , pc : ‒ : , pc : ‒ : and pc : ‒ : for pc : (supplementary figure a). further, epi scan for msms confirmed the fragmented daughter ions for the identification of three pc ( : ) isomers (supplementary figure b, c and d). from the analysis of phospholipids belonging to phospholipid classes (pc, pe, pg, pi, ps, and pa) in the library, we were able to identify phospholipid species. among them, phospholipid (pc, pe, pg, pi, ps, pa) with chain length with unsaturation had the highest abundance (figure c and supplementary table ). within pls, pc : has highest abundance (supplementary figure and supplementary table ). we observed three isomers of pc : , among which pc ( : / : ) was the most abundant (figure d (supplementary table ). epi scan confirms the ms/ms identification of pl,s. we were also able to identify isomers of dag (e.g. dag : / : and dag : / : ) (supplementary table ). a list of all lipid species with their isomers and abundance in terms of area under the chromatogram is given in (supplementary table ). further validation of retention time through epi confirm the ms/ms spectrum matching with putative lipid structure for other lipid classes. method validation: limit of blank (lob), limit of detection (lod), limit of quantitation (loq), and linear range. the raw analytical signal in blank was considered for establishing the lob, which was determined from area under the chromatogram for the selected transition of each lipid standards (supplementary table ). the lod and loq were obtained from the raw analytical signal (area under the chromatogram) obtained by progressively diluting the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . lipid standards. the lod and loq were based on the average values obtained in replicates, reflecting inter day variability as mentioned in the materials and methods section. a representative graph of lod and loq for sm (positive mode) and pc (negative mode) is shown in figure a and b, while the values of lod and loq for all the species are provided in table . the lods for all lipid classes were in range of . pmol/l – . pmol/l except dag which was nmol/l. detection limit for sm, lpc, pe, and pg were found to be in femtomolar range, while the rest were in picomolar range. the lowest loq was detected for pg- . pmol/l and highest for dag- nmol/l. the linearity of the method was checked by defining the relationship between raw values of analytical signal for each lipid standard and its concentration in presence of matrix (plasma). the linear range was determined by checking the performance limit from loq to the highest end of the concentration; based on the coefficient of determination (r ) value. spike and recovery and coefficient of variation to determine the percent recovery of all the lipid species, a known amount of lipid standards, were added to plasma (matrix) before or after (spike) extraction of the lipids from the plasma. the raw area signals obtained from these two conditions were compared to determine the percentage recovery. these experiments were performed on three different days and the average percent recovery of the lipid standards was determined (figure a and supplementary table ). to determine the coefficient of variation of all the lipid species, we extracted lipids from plasma pooled from individuals. for intra batch variations, the same sample was subjected to mass spectrometric analysis times. the coefficient of variation was calculated after sum normalization of raw values obtained within each class. to obtain the inter day variability; lipids were extracted from the same sample on different days. a total of , , lipid species were detected on day , day , and day respectively. the median cv of all the identified lipids on three different days was . %, . %, and . % respectively. on day out of lipid species, we observed lipid species with cv below %, whereof of the lipid species has below % cv and have less than % cv (figure b). we observed and lipids species on day and day respectively, have less than % cv. among all three days, phospholipids had cv< . the tags are a large class and species were measurable in plasma with cv falling below %. in lysopl's, lpc and lpe were detected consistently on different days. but other lysopl's were very low abundant and therefore had much lower reproducibility. total of lysophospholipids out of had cv< %. in total we identified lipid species with cv< % in either of the three days, out of which lipid species has been consistently detected in all days with cv< %. the detailed table of % cv for individual lipid species observed on different days is given in supplementary table . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . lipidomics study in normal and vitamin b deficient human plasma- vitamin b , is a micronutrient mainly sourced from animal products, deficiency of which has been reported to result in lipid imbalance. using this method, we attempted to identify lipid species that are altered due to vitamin b deficiency. there was no significant alteration in any of the lipid classes when taken as a whole between the two groups (supplementary table ). however, when individual lipid species within the classes were compared, we found that, lipid species containing one of the types of omega fatty acid (fa : ) was significantly low in plasma of vitamin b deficient individuals (figure a). in total lipid species containing : fatty acids were down regulated significantly, two of tag and pc, one each from pe and pa. additionally, lipid species containing a omega fatty acid (fa : ) were significantly high in vitamin b deficient condition (figure b, supplementary table ). these results hint at the possibility of lower ω- : ω- ratio in vitamin b deficient individuals. discussion: lipids in general are known to be associated with the pathogenesis of various complex diseases . however, the exact role played by each lipid species has not been studied in detail majorly due to the limitation in identifying individual lipid species in large scale studies. we report a single extraction, targeted mass spectrometric method using amide-hilic-chromatography and scheduled mrm with variable-rtw and relative- dtw which detects more than lipid species from lipid classes including various isomers in a single run of minutes. with this method, which covers most of the lipid species which are present in human plasma with - carbons atoms and - double bonds in fatty acid chain, we could identify considerably higher number of lipid species than those reported in previous large-scale lipidomics studies , , - . in this method, the mrm transitions were monitored in a particular time segment, rather than performing scans for all the lipid species during the entire run. this strategy reduces the time required for identification of the multiple transitions. we improved the coverage by additionally optimizing the assigned dwell time weightage for each lipid species, which is required especially for medium and low abundant lipid species. the dwell time for each lipid species was customized and the dwell weightage was optimized based on lipid species abundance without affecting the target scan time in each cycle. this improved peak quality with good reproducibility. current methods for large-scale lipid analysis can only identify the lipid classes and fatty acid chains but the structure specificity of lipid analysis is critical for studying the biological function of lipids. finding the composition of fatty acyl chain with respect to sn-position is a major limitation in large scale lipidomics studies , . recently using a combination of photochemical reaction (ozone-induced dissociation and ultraviolet photodissociation) with tandem ms, cao et al. reported the identification of isomers for tags and pls on the basis of sn-position and carbon-carbon double bond (c=c) . their identification also revealed the sequential loss of different fatty acyl chain based on sn-position, disclosing identification of different positional isomers . however, a single step identification of tag isomers in large scale studies remains a challenge due (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . to the three fatty acyl chains with glycerol backbone, bearing no easily ionizable moiety , . we have focused on identification of structural isomers based on sn-position using lc-ms platform, without adding extra step to burden the analysis time and effort. we were able to detect structural isomers with respect to fatty acyl chain at sn-position where the neutral loss of one of the sn-position fatty acyl chain (rcooh) and nh from parent ion (m+nh +) makes their detection possible. detection was purely based on assigning a unique combination of q /q for structural isomer of tag species (figure a); however, one of the limitations of this method is the inability to assign fatty acyl group (sn , sn , or sn ) to their respective sn-position. hence, the three fatty acyl chains are represented by the adding the number of carbon atoms and unsaturation level (e.g., tag ( : ) and the identified fatty acid at one of the sn-position (e.g., fa- : ) is represented by tag ( : /fa : ). the optimized q for phospholipids including pc, pe, pg, pi, ps, and pa was derived from neutral loss of their fatty acid side chains in the negative ion mode (e.g., pc : ‒ : , pc : ‒ : , pc : ‒ : and pc : ‒ : for pc : ) (figure d). in total we were able to identify tag species which belong to different fatty acyl compositions (sn +sn +sn ) and pl species based on fatty acyl compositions (sn +sn ) (supplementary table ). the lod for various lipid species in our method was between . fmol/l – . pmol/l which was better than or similar to previously reported lod utilizing different lc- ms platforms , , , , and similar to a previously reported large scale lipidomics method using supercritical fluid-scheduled mrm ( ‒ , fmol/l) . the loq in previously reported methods were in between nmol to umol/l range while we have observed much lower loqs ( . pmol/l to . pmol/l) . apart from this, the calculation of limits was based on mean raw analytical signal and sd which gives better idea about the method, without any false detection hope (or lower detection limits). in our method, dag has highest lod and loq of nmol/l and nmol/l respectively, which was still lower as compared to the previously reported methods for targeted analysis . the linearity of our method was found to be comparable to previous lipidomics methods , , . the recovery of lipid species in our method was in the range of . % - . %, except dag - . %, which were within the generally accepted range for quantification and is comparable with other lipidomics studies , . although, dag class is not frequently quantified in other published papers, while we observed comparatively higher recovery because the concentration present in the lipid standard mix ( : of lipid standard master mix) used for recovery test doesn’t fall in the linear range . a major challenge in lipidomics experiments have been the high variability in the signals and even the “shared reference materials harmonize lipidomics across ms-based detection platforms and laboratories” have shown that most lipid species showed large variability (cv) between % to % . however variability for endogenous lipid species that were normalized to the corresponding stable isotope-labelled analogue were lower than % , . in this method, we used sum normalization (although we are not addressing batch effect in this study) and found that lipid species had a cv < % . overall, the median cv of our method ( . %, . %, and . %), similar to or better than the previous reports , , , we have also reported species-specific cv. it should be (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . noted that most of the large scale lipidomics studies previously done reports the median or average cv of the method but not the species-specific cv , , , . lipidomics study in normal and vitamin b deficient human plasma- using the method developed we identified lipid species that are altered in individuals with vitamin b deficiency. vitamin b is a cofactor of methyl malonyl coa mutase and controls the transfer of long-chain fatty acyl-coa into the mitochondria . deficiency of vitamin b results in accumulation of methylmalonyl coa increasing lipogenesis via inhibition of beta-oxidation. in the last decade, several studies revealed that vitamin b deficiency causes alteration in the lipid profile through changes in lipid metabolism, either by modulating their synthesis or its transport . in particular, the effects of vitamin b on omega fatty acid and phospholipid metabolism have received much attention. khaire a et al., found that vitamin b deficiency increased cholesterol levels but reduced docosahexaenoic acid (dha-omega ) . an imbalance in maternal micronutrients (folic acid, vitamin b ) in wistar rats increased maternal oxidative stress, decreases placental and pup brain dha levels, and decreases placental global methylation levels , . although various studies have shown that b deficiency results in adverse lipid profile as well as pathophysiological changes linked to cad, type diabetes mellitus and atherosclerosis, very few studies have independently investigated the effect of vitamin b status on changes in human plasma lipid among apparently healthy population - . importantly the lipid species that are altered because of the vitamin deficiency are still not yet well understood. to our knowledge, this is the first study to identify lipids with a significantly decreased ω- fatty acid ( : ) chains and increased ω- ( : ) chains, which might alter/increased ω- to ω- fatty acid ratio in human plasma in relation to vitamin b deficiency and may promote development of many chronic diseases. notably this study for the first time in humans demonstrated that vitamin b deficiency may induce lower level of synthesis or a higher rate of degradation of lipid species containing omega fatty acid (fa : ). most importantly we found that although there was no significant alteration in the lipid classes, individual lipid species varied in vitamin b deficient individuals clearly demonstrating the utility of identifying lipid species. the application of scheduled mrm with variable-rtw and relative-dtw enabled large- scale quantification of lipid species in a single-run as compared to unscheduled/scheduled/dynamic mrm. with this combinatorial approach, we were able to detect more than lipid species in plasma, including isomers of tag, dag and pl's. additionally we validated the retention time through msms analysis in ida-epi scan mode by matching fragmented daughter ion from msms spectrum to putative lipid species structure. it should be noted that the mrms currently used were specific for (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . plasma and may not be ideal for other biological systems. therefore, for developing a separate mrm panel may be required for each system. to the best of our knowledge this is the largest number of lipid species identified till date in a single experiment. a comprehensive identification of structural isomers in large-scale lipid method proves to be critical for studying the important biological functions of lipids. acknowledgement the authors would like to thank dr. mainak dutta from bits dubai, mrs. akanksha singh and dr. christei hunter of sciex for their invaluable inputs and suggestions in shaping this study. akash kumar bhaskar and salwa naushin would like to thank csir for their fellowship. references smilowitz, j. t. et al. nutritional lipidomics: molecular metabolism, analytics, and diagnostics. molecular nutrition & food research , - ( ). muro, e., atilla-gokcumen, g. e. & eggert, u. s. lipids in cell biology: how can we understand them better? molecular biology of the cell , - ( ). yáñez-mó, m. et al. biological properties of extracellular vesicles and their physiological functions. journal of extracellular vesicles , ( ). van meer, g., voelker, d. r. & feigenson, g. w. membrane lipids: where they are and how they behave. nature reviews molecular cell biology , - ( ). glomset, j. a. protein-lipid interactions on the surfaces of cell membranes. curr. opin. struct. biol , - ( ). ye, r., onodera, t. & scherer, p. e. lipotoxicity and β cell maintenance in obesity and type diabetes. journal of the endocrine society , - ( ). fu, s. et al. aberrant lipid metabolism disrupts calcium homeostasis causing liver endoplasmic reticulum stress in obesity. nature , - ( ). yang, m., zhang, y. & ren, j. autophagic regulation of lipid homeostasis in cardiometabolic syndrome. frontiers in cardiovascular medicine , ( ). beloribi-djefaflia, s., vasseur, s. & guillaumond, f. lipid metabolic reprogramming in cancer cells. oncogenesis , e -e ( ). wymann, m. p. & schneiter, r. lipid signalling in disease. nature reviews molecular cell biology , - ( ). quehenberger, o. & dennis, e. a. the human plasma lipidome. new england journal of medicine , - ( ). shevchenko, a. & simons, k. lipidomics: coming to grips with lipid diversity. nature reviews molecular cell biology , - ( ). sud, m. et al. lmsd: lipid maps structure database. nucleic acids research , d -d ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pradas, i. et al. lipidomics reveals a tissue-specific fingerprint. frontiers in physiology , ( ). van meer, g. cellular lipidomics. the embo journal , - ( ). brügger, b., erben, g., sandhoff, r., wieland, f. t. & lehmann, w. d. quantitative analysis of biological membrane lipids at the low picomole level by nano-electrospray ionization tandem mass spectrometry. proceedings of the national academy of sciences , - ( ). wu, z., shon, j. c. & liu, k.-h. mass spectrometry-based lipidomics and its application to biomedical research. journal of lifestyle medicine , ( ). wenk, m. r. the emerging field of lipidomics. nature reviews drug discovery , - ( ). han, x. & gross, r. w. global analyses of cellular lipidomes directly from crude extracts of biological samples by esi mass spectrometry a bridge to lipidomics. journal of lipid research , - ( ). kirkwood, j. s., maier, c. & stevens, j. f. simultaneous, untargeted metabolic profiling of polar and nonpolar metabolites by lc‐q‐tof mass spectrometry. current protocols in toxicology , . . - . . ( ). takeda, h. et al. widely-targeted quantitative lipidomics method by supercritical fluid chromatography triple quadrupole mass spectrometry. journal of lipid research , - ( ). contrepois, k. et al. cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. scientific reports , - ( ). khan, m. j. et al. evaluating a targeted multiple reaction monitoring approach to global untargeted lipidomic analyses of human plasma. rapid communications in mass spectrometry , e ( ). dekker, b. reduce complexity by choosing your reactions. nature methods , - ( ). mao, c. et al. cloning and characterization of a mouse endoplasmic reticulum alkaline ceramidase an enzyme that preferentially regulates metabolism of very long chain ceramides. journal of biological chemistry , - ( ). song, j. et al. a highly efficient, high-throughput lipidomics platform for the quantitative detection of eicosanoids in human whole blood. analytical biochemistry , - ( ). weir, j. m. et al. plasma lipid profiling in a large population-based cohort. j lipid res , - , doi: . /jlr.p ( ). zhang, w. et al. online photochemical derivatization enables comprehensive mass spectrometric analysis of unsaturated phospholipid isomers. nature communications , - ( ). thomas, m. c., mitchell, t. w. & blanksby, s. j. ozonolysis of phospholipid double bonds during electrospray ionization: a new tool for structure determination. journal of the american chemical society , - ( ). baba, t., campbell, j. l., le blanc, j. y. & baker, p. r. structural identification of triacylglycerol isomers using electron impact excitation of ions from organics (eieio). journal of lipid research , - ( ). tabassum, r. et al. genetic architecture of human plasma lipidome and its link to cardiovascular disease. nature communications , - ( ). li, j. et al. large-scaled human serum sphingolipid profiling by using reversed-phase liquid chromatography coupled with dynamic multiple reaction monitoring of mass spectrometry: method development and application in hepatocellular carcinoma. journal of chromatography a , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . liang, j. et al. a dynamic multiple reaction monitoring method for the multiple components quantification of complex traditional chinese medicine preparations: niuhuang shangqing pill as an example. journal of chromatography a , - ( ). rao, z. et al. development of a dynamic multiple reaction monitoring method for determination of digoxin and six active components of ginkgo biloba leaf extract in rat plasma. journal of chromatography b , - ( ). shah, i., petroczi, a., uvacsek, m., ránky, m. & naughton, d. p. hair-based rapid analyses for multiple drugs in forensics and doping: application of dynamic multiple reaction monitoring with lc-ms/ms. chemistry central journal , ( ). andrade, g. et al. liquid chromatography–electrospray ionization tandem mass spectrometry and dynamic multiple reaction monitoring method for determining multiple pesticide residues in tomato. food chemistry , - ( ). jia, z.-x., zhang, j.-l., shen, c.-p. & ma, l. profile and quantification of human stratum corneum ceramides by normal-phase liquid chromatography coupled with dynamic multiple reaction monitoring of mass spectrometry: development of targeted lipidomic method and application to human stratum corneum of different age groups. analytical and bioanalytical chemistry , - ( ). xu, g., amicucci, m. j., cheng, z., galermo, a. g. & lebrilla, c. b. revisiting monosaccharide analysis–quantitation of a comprehensive set of monosaccharides using dynamic multiple reaction monitoring. analyst , - ( ). armbruster, d. a. & pry, t. limit of blank, limit of detection and limit of quantitation. the clinical biochemist reviews , s ( ). armbruster, d. a., tillman, m. d. & hubbs, l. m. limit of detection (lqd)/limit of quantitation (loq): comparison of the empirical and the statistical methods exemplified with gc-ms assays of abused drugs. clinical chemistry , - ( ). rower, j. e., bushman, l. r., hammond, k. p., kadam, r. s. & aquilante, c. l. validation of an lc/ms method for the determination of gemfibrozil in human plasma and its application to a pharmacokinetic study. biomedical chromatography , - ( ). van amsterdam, p. et al. the european bioanalysis forum community’s evaluation, interpretation and implementation of the european medicines agency guideline on bioanalytical method validation. bioanalysis , - ( ). medina, j. et al. single-step extraction coupled with targeted hilic-ms/ms approach for comprehensive analysis of human plasma lipidome and polar metabolome. metabolites , ( ). schoeny, h. et al. preparative supercritical fluid chromatography for lipid class fractionation—a novel strategy in high-resolution mass spectrometry based lipidomics. analytical and bioanalytical chemistry, - ( ). rampler, e. et al. simultaneous non-polar and polar lipid analysis by on-line combination of hilic, rp and high resolution ms. analyst , - ( ). cao, w. et al. large-scale lipid analysis with c= c location and sn-position isomer resolving power. nature communications , - ( ). wolrab, d., chocholoušková, m., jirásko, r., peterka, o. & holčapek, m. validation of lipidomic analysis of human plasma and serum by supercritical fluid chromatography–mass spectrometry and hydrophilic interaction liquid chromatography–mass spectrometry. analytical and bioanalytical chemistry, - ( ). triebl, a. et al. shared reference materials harmonize lipidomics across ms-based detection platforms and laboratories. journal of lipid research , - ( ). green, r. et al. vitamin b deficiency. nature reviews disease primers , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . saraswathy, k. n., joshi, s., yadav, s. & garg, p. r. metabolic distress in lipid & one carbon metabolic pathway through low vitamin b- : a population based study from north india. lipids in health and disease , ( ). khaire, a., rathod, r., kale, a. & joshi, s. vitamin b and omega- fatty acids together regulate lipid metabolism in wistar rats. prostaglandins, leukotrienes and essential fatty acids , - ( ). kulkarni, a. et al. effects of altered maternal folic acid, vitamin b and docosahexaenoic acid on placental global dna methylation patterns in wistar rats. plos one , e ( ). roy, s. et al. maternal micronutrients (folic acid and vitamin b ) and omega fatty acids: implications for neurodevelopmental risk in the rat offspring. brain and development , - ( ). adaikalakoteswari, a. et al. vitamin b deficiency is associated with adverse lipid profile in europeans and indians with type diabetes. cardiovascular diabetology , ( ). kumar, j. et al. vitamin b deficiency is associated with coronary artery disease in an indian population. clinical chemistry and laboratory medicine (cclm) , - ( ). mahalle, n., kulkarni, m. v., garg, m. k. & naik, s. s. vitamin b deficiency and hyperhomocysteinemia as correlates of cardiovascular risk factors in indian subjects with coronary artery disease. journal of cardiology , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure and table: figure chromatograms of the scheduled mrm method with variable-rtw and relative-dtw. a a total ion chromatogram of method consisting of lipid species and internal standards from lipid classes in positive or negative mode. b,c in positive ion mode, sm ( : )- / . has elution window of . seconds with dwell weight (b) and ce ( : )- . / . has elution window of . seconds with dwell weight . (c). d,e in negative ion mode, lpc ( : )- . / . and lpe ( : )- . / . has equal elution window ( . seconds) but lpe ( : ) has higher dwell weight ( . ) (d) compared to lpc ( : ) dwell weight ( ) (e). figure xic (extracted ion chromatogram) of nine isomers of tag ( : ). parent m/z for all was . while the product m/z was derived from the remaining mass (r +r with glycerol backbone) after the loss of fatty acid released from the parent ion. r +r can be any composition of fatty acid which sum-up to give product ion. different color of dot represents different isomers confirmed through ida-epi experiment (refer to supplementary figure ). figure abundance of different lipids. a abundance of different tags on the basis of total chain length and unsaturation. b tag isomers were detected from different categories of tag. c abundance of different phospholipids on the basis of total chain length and unsaturation. d abundance of phospholipids belonging to classes (pc, pe, pg, pi, ps, and pa), different dots of same color represent isomers. figure representative graphs from positive and negative ion mode showing lod, loq and coefficient of determination, x and y-axis was log transformed. a sm from positive ion mode and b pc from negative ion mode. figure validation of the method. a spike and recovery of different lipid class, blue bar represent the recovery of lipids when known concentration of lipid standards was spiked during extraction and green bar represents the reference (same concentration of lipid standard spiked after extraction). b coefficient of variance on day where lipid species from lipid classes were detected (n= ). figure significantly dysregulated lipid species in vitamin b deficiency. a significantly down-regulated omega fatty acid : in vitamin b deficiency. b significantly upregulated omega fatty acid : in vitamin b deficient condition. table . analytical validation of the method with lipid standards. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table . lipid class ion mode number of lipid species internal standard lod conc. (pmol/l) loq conc. (pmol/l) coefficient of determination (r ) sm esi+ sm (d : - : (d )) . . . ce esi+ ceramide ( : ) . . . cer esi+ tag esi+ tag ( : - : (d )- : ) . . . dag esi+ dag ( : - : (d )) . . . mag esi+ lpc esi- lpc ( : (d )) . . . pc esi- pc ( : - : (d )) . . . lpe esi- lpe ( : (d )) . . . pe esi- pe ( : - : (d )) . . . lpg esi- pg ( : - : (d )) . . . pg esi- lpi esi- pi ( : - : (d )) . . . pi esi- lps esi- ps ( : - : (d )) . . . ps esi- lpa esi- pa ( : - : (d )) . . . pa esi- (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . e . e . e . e . e . e . e . e . e total ion chromatograma. tag cer ce dag mag pg pc sm pi pe lpc lpe ps pa lpg lpi lps lpa time e e e e e . . . c e( : ) - . / . c. s m( : ) - . / . b. time . . . . . e . e . e . e . e . e . . time . e . e . e . e . e . e . . . . time l p c ( : ) - . / . d. . . . . time e. l pe( : ) - . / . . . . figure . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . e . e . e . e t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) t ag ( : /f a : )( . / . ) ti me, min figure . r r r r r r r r r r r r r r r r r r oh o ch oh o ch o ch oh o ch o ch oh o ch oh o ch oh o ch oh o ch (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . u n sa tu ra ti o n u n sa tu ra ti o n chain length chain length a. b. chain length pc pc pc pc pc pc pc pe pe pe pe pe pe pg pg pg pgpg pg pg pi pi pi pi pi pi ps ps ps ps ps ps ps pa pa pc pc pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pg pi pi pi pi pi pi pi pi pi ps ps ps ps ps ps ps ps ps ps pa pa pa pa pa pa pa pc pc pc pc pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pg pg pi pi pi pi pi pi pi pi pi ps ps ps ps ps ps ps ps ps ps pa pa pa pa pa pa pa pc pc pc pc pc pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pg pg pg pi pi pi pi pi pi pi pi ps ps ps ps ps ps ps ps ps ps ps ps pa pa pa pa pa pa pa pc pc pc pc pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pg pg pi pi pi pi pi pi pi pi pi pi pi ps ps ps ps ps ps ps ps ps ps ps ps pa pa pa pa pa pa pa pa pa pc pc pc pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pg pg pg pi pi pi pi pi pi pi pi pi ps ps ps ps ps ps ps ps ps pa pa pa pa pa pa pa pa pc pc pc pc pc pc pc pc pe pe pe pe pe pe pe pg pg pg pg pg pg pg pg pi pi pi pi pi ps ps ps ps ps ps ps pa pa pa pa pa pa pcpc pc pe pe pe pg pg pg pi ps ps ps pc pe pg ps c. u n sa tu ra ti o n u n sa tu ra ti o n chain length d. e+ e+ e+ e+ isomer isomer isomer isomer isomer isomer isomer isomer isomer isomer isomer isomer abundance isomers e+ e+ e+ e+ pc pe pg pi ps pa abundance lipid class e+ e+ e+ e+ e+ e+ e+ e+ abundance abundance (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . loq lod lod loq figure . a. b. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . a. b. sm ce cer tag dag lpc pc lpe pe lpg pg pi lps ps pa lipid classes c o e � ci e n t o f v a ri a n ce c o e � cie n t o f v a ria n ce . . density (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . tag( : /fa : )+nh . e- . e- . e- . e- . e- . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- pc( : / : )+aco pc( : / : )+aco . e- . e- . e- . e- . e- . e- pe( : / : )-h . e- . e- . e- . e- . e- pa( : / : )-h . e- . e- . e- . e- low normal tag( : /fa : )+nh . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- low normal low normal low normal low normal low normal low normal low normal low normal tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- low normal low normal low normal tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- low normallow normallow normal tag( : /fa : )+nh . e- . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- . e- . e- low normal low normal low normal tag( : /fa : )+nh . e- . e- . e- . e- . e- tag( : /fa : )+nh . e- . e- . e- . e- pe(p- : / : )-h . e- . e- . e- . e- . e- low normal s um n or m al iz ed a re a s um n or m al iz ed a re a s um n or m al iz ed a re a low normal low normal a. omega - : b. omega - : . e- figure . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . structural and mechanistic insights into the artemis endonuclease and strategies for its inhibition structural and mechanistic insights into the artemis endonuclease and strategies for its inhibition yuliana yosaatmadja ⱡ, hannah t baddock ⱡ, joseph a newman , marcin bielinski , angeline e gavard , shubhashish m m mukhopadhyay , adam a dannerfjord , christopher j schofield , peter j mchugh *, opher gileadi *. centre for medicines discovery, university of oxford, orcrb, roosevelt drive, oxford, ox dq, united kingdom; department of oncology, mrc-weatherall institute of molecular medicine, university of oxford, oxford ox ds, united kingdom; chemistry research laboratory, university of oxford, mansfield road, oxford, ox ta, united kingdom. * to whom correspondence should be addressed. email: opher.gileadi@cmd.ox.ac.uk correspondence may also be addressed to peter.mchugh@imm.ox.ac.uk. ⱡ these authors contributed equally abstract artemis (dclre c) is an endonuclease that plays a key role in development of b- and t- lymphocytes and in dna double-strand break repair by non-homologous end-joining (nhej). artemis is phosphorylated by dna-pkcs and acts to open dna hairpin intermediates generated during v(d)j and class-switch recombination. consistently, artemis deficiency leads to radiosensitive congenital severe immune deficiency (rs-scid). artemis belongs to a structural superfamily of nucleases that contain conserved metallo-β-lactamase (mbl) and β-casp (cpsf-artemis-snm -pso ) domains. here, we present crystal structures of the catalytic domain of wild type and variant forms of artemis that cause rs-scid omenn syndrome. the truncated catalytic domain of the artemis is a constitutively active enzyme that with similar activity to a phosphorylated full-length protein. our structures help explain the basis of the predominantly endonucleolytic activity of artemis, which contrast with the predominantly exonuclease activity of the closely related snm a and snm b nucleases. the structures also reveal a second metal binding site in its β-casp domain that is unique to artemis. by combining our structural data that from a recently reported structure we were able model the interaction of artemis with dna substrates. moreover, co-crystal structures with inhibitors indicate the potential for structure-guided development of inhibitors. introduction nucleases hydrolyse the phosphodiester bonds of nucleic acids and are grouped into two broad classes: exonucleases and endonucleases. exonucleases are often non sequence- specific, while endonucleases can be further grouped into sequence-specific .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:peter.mchugh@imm.ox.ac.uk https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / endonucleases, such as restriction enzymes, and structure-selective endonucleases [ ]. artemis (also known as snm c or dclre c), along with snm a (dclre a) and apollo (snm b or dclre b), are human nucleases that fall into the extended structural family of metallo-β-lactamase (mbl) fold enzymes [ , ]. the n-terminal region of artemis is predicted to have a core mbl fold (aa – , – ) with an inserted β-casp (cpsf , artemis, snm and pso ) domain (aa – ). β-casp domains are present within the larger family of eukaryotic nucleic acid processing mbls and confer both dna/rna binding and nuclease activity [ ]. the c-terminal region of artemis mediates protein-protein interactions, contains post translational modification (ptm) sites, directs subcellular localisation, and may modulate catalytic activity [ – ]. although snm a, snm b, and artemis are predicted to have similar core structures for their catalytic domains, each have distinct cellular functions and substrate specificities. while snm a and snm b/apollo are exclusively ' to ' exonucleases, the predominant activity of artemis is endonucleolytic [ , ], although a minor ' to ' exonuclease activity has been reported [ ]. human snm a localises to sites of dna damage, can digest past dna damage lesions in vitro, and is involved in the repair of interstrand crosslinks (icls) [ – ]. snm b/apollo is a shelterin-associated protein required for resection at newly- replicated leading-strand telomeres to generate the '-overhang necessary for telomere loop (t-loop) formation and telomere protection [ – ]. both snm a and snm b/apollo prefer ssdna substrates in vitro, with an absolute requirement for a free '-phosphate [ , ]. by contrast, artemis prefers hairpins and dna junctions as substrates for its endonuclease activity, although it is able to process ssdna substrates [ – ] the endonuclease activity of artemis is responsible for hairpin opening in variable (diversity) joining (v(d)j) recombination [ ] and contributes to end-processing in the canonical non-homologous end joining (c-nhej) dna repair [ – ]. v(d)j recombination is initiated by the recognition and binding of recombination-activating gene proteins (rag and rag ) to the recombination signal sequences (rsss) adjacent to the v, d, and j gene segments. upon binding, the rag proteins induce double-strand breaks (dsbs) and create a hairpin at the coding ends [ – ]. the ku heterodimer recognises the dna double- strand break and recruits dna-dependent protein kinase catalytic subunit (dna-pkcs) and artemis to mediate hairpin opening [ ]. following hairpin opening, the nhej machinery containing the xrcc /xlf(paxx)/dna-ligase iv complex is recruited to catalyse the processing and ligation reactions of the dna ends [ , , ]. v(d)j recombination is an essential process in antibody maturation [ , , ]. mutations in the artemis gene cause aberrant hairpin opening resulting in severe combined immune deficiency (rs-scid), with sensitivity to ionising radiation due to impairment of the predominant dsb repair pathway in mammalian cells, nhej [ , , ], and another form of scid (omenn syndrome) associated with hypomorphic artemis mutations [ , ] .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / one of the most common mutations leading to artemis loss-of-function are large deletions in the first four exons and a nonsense founder mutation, as found in navajo and apache native americans [ ]. in addition, missense mutations and in-frame deletions in the highly conserved residues such as h , d and h can also abolish artemis’ protein function [ ]. owing to the key roles of artemis and related dsb repair enzymes in both programmed v(d)j recombination and non-programmed c-nhej dsb repair, they are attractive pharmacological targets for the radiosensitisation of tumours. here, we present a high-resolution crystal structure of the catalytic core of artemis (aa – ) containing both mbl and a β-casp domains. this reveals that artemis possesses a unique feature, that is not present in snm a and snm b/apollo, i.e., a second metal binding site in its β-casp domain that bears a resemblance to classical cys his zinc finger motifs. we propose that this second metal coordination site is involved in artemis stabilisation and substrate specificity. we also present a model for artemis dna binding based on our data and another recently published structure. the artemis dna model is compared with models of dna binding from related nucleases to reveal distinct features that define a role for artemis in the end-joining reaction. following development of an assay suitable for inhibitor screens, we identified drug-like molecules that could potentially inhibit both the artemis active site and its essential zinc finger-like motif. material and methods cloning and site directed mutagenesis of wt and mutant artemis (aa - ) the artemis mbl-β-casp domain (wt and mutant) encoding constructs were cloned into the baculovirus expression vector pbf- hzb which combines an n-terminal his sequence and the z-basic tag (genbanktm accession number kp . ) for efficient purification and to promote solubility. the artemis gene was cloned using ligation independent cloning (lic) [ ]. site directed mutagenesis was carried out using an inverse pcr experiment whereby an entire plasmid is amplified using complementary mutagenic primers (oligonucleotides) with minimal cloning steps [ ]. using the high-fidelity and high- processivity enzyme herculase ii fusion dna polymerase (agilent), a pcr was performed to amplify a whole plasmid. the pcr product was then added to a kld enzyme mix (neb) reaction and was incubated at room temperature for hour, prior to transformation into escherichia coli cells. expression and purification of wt and mutant artemis with imac (aa - ) baculovirus generation was performed as previously described [ ]. recombinant proteins were produced in sf cell at x cells/ ml infected with . ml of p virus for wt and ml of p virus for mutants respectively. infected sf cells were harvested h after infection by centrifugation ( x g, min). the cell pellet was resuspended in ml/ l .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / lysis buffer ( mm hepes ph . , mm nacl, mm imidazole, % (v/v) glycerol and mm tcep), snap frozen in liquid nitrogen, then stored at − °c for later use. thawed cell aliquots were lysed by sonication. the lysates were clarified by centrifugation ( , g, min), then the supernatant was passed through a . μm filter (millipore) and loaded onto an equilibrated (lysis buffer) immobilised metal affinity chromatography column (imac) (ni-nta superflow cartridge, qiagen). the immobilised protein was washed with lysis buffer, then eluted using a linear gradient of elution buffer ( mm hepes ph . , mm nacl, mm imidazole, % v/v glycerol, and mm tcep). the protein containing fractions were pooled and passed through an ion exchange column (hitrap® sp ff ge healthcare life sciences) pre-equilibrated in the sp buffer a ( mm hepes ph . , mm nacl, % (v/v) glycerol and mm tcep). the protein was eluted using a linear gradient of sp buffer b (sp buffer a with m nacl), and fractions containing the tag-free artemis were identified by electrophoresis. artemis containing fractions were pooled and dialysed overnight at °c in sp buffer a and supplemented with recombinant tobacco etch virus (tev) protease for cleavage of the his- zb tag. the protein was subsequently loaded into an ion exchange column (hitrap® sp ff ge healthcare life sciences), pre-equilibrated in the sp buffer a to remove his-zb tag and uncleaved protein. the protein was eluted using a linear gradient of sp buffer b, and fractions containing the tag-free artemis were identified by electrophoresis. artemis- containing fractions from the sp column elution were combined and concentrated to ml using a kda mwco centrifugal concentrator. the protein was then loaded on to a superdex increase / gl equilibrated with sec buffer ( mm hepes ph . , mm nacl, % (v/v) glycerol, mm tcep). mass spectrometric analysis of the purified proteins revealed masses of . da, . da, . da, . da for wt, h a, d a and h d proteins, respectively. the calculated masses are . , . , . and . , respectively, all within . da of the measured masses. expression and purification of wt truncated artemis catalytic domain without imac (aa - ) the truncated artemis protein was expressed and purified in a similar manner as described above except for the first purification step. we used ml hitrap® sp fast flow (ge health care) column as the first step of purification. following an overnight tev cleavage the protein was subjected to a second ion exchange step ( ml hitrap® sp fast flow (ge health care)) for the removal of the z-basic protein tag. the protein was further purified by size exclusion chromatography (highload® / superdex® ). cloning, expression and purification of full-length wt artemis (aa - ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / the full-length artemis encoding construct was cloned into pfb-ct hf-lic, a baculovirus expression vector containing a c-terminal his and flag tag. pfb-ct hf-lic was a gift from nicola burgess-brown (addgene plasmid # ; http://n t.net/addgene: ; rrid: addgene_ ). as for the truncated protein, the full -length artemis gene was also cloned using ligation independent cloning (lic) [ ]. the baculovirus mediated expression of the full length dclre c/ artemis gene was performed in a manner similar to that used for the truncated protein. however, instead of infection with . ml of p virus, . ml of p virus was used to infect sf cells at x cells/ ml for the expression of the full-length artemis construct. cell harvesting and the initial imac purification steps were performed as described for the catalytic domain. following imac chromatographic purification, tev cleavage overnight in dialysis buffer ( mm hepes ph . , . m nacl, % glycerol and mm tcep) gave protein which was then passed through a ml ni-sepharose column; the flowthrough fractions were collected. the artemis protein was then concentrated using a centrifugal concentrator (centricon, mwco kda) before loading on a superdex s hr / gel filtration column in dialysis buffer. fractions containing purified artemis protein were pooled and concentrated to mg/ml. electospray mass spectrometry (esi-qtof) reversed-phase chromatography was performed in-line prior to mass spectrometry using an agilent uhplc system (agilent technologies inc. – palo alto, ca, usa). concentrated protein samples were diluted to . mg/ml in . % formic acid and µl was injected on to a . mm x . mm zorbax um sb-c guard column housed in a column oven set at oc. the solvent system used consisted of . % formic acid in ultra- high purity water (millipore) (solvent a) and . % formic acid in methanol (lc-ms grade, chromasolve) (solvent b). chromatography was performed as follows: initial conditions were % a and % b and a flow rate of . ml/min. a linear gradient from % b to % b was applied over seconds. elution then proceeded isocratically at % b for seconds followed by equilibration at initial conditions for a further seconds. protein intact mass was determined using a electrospray ionisation quadrupole time-of-flight mass spectrometer (agilent technologies inc. – palo alto, ca, usa). the instrument was configured with the standard esi source and operated in positive ion mode. the ion source was operated with the capillary voltage at v, nebulizer pressure at psig, drying gas at oc and drying gas flow rate at l/min. the instrument ion optic voltages were as follows: fragmentor v, skimmer v and octopole rf v. protein crystallisation and soaking .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / artemis (pdb: tt ) was crystallised using the sitting drop vapour diffusion method by mixing nl protein with nl crystallisation solution comprising . m ammonium chloride, % (v/v) peg . crystals grew after weeks and reached maximum size within weeks. an unliganded crystal was flash frozen in liquid nitrogen, cryoprotected with the mother liquor supplemented with % (v/v) ethylene glycol solution. the non-imac purified artemis (pdb: af ) was crystallised in a similar manner, with the addition of nl of crystal seed solution obtained from previous crystallisation experiment. the crystals were grown in a solution comprising . m ammonium chloride and % (v/v) peg at °c. crystals grew after one day and reached a maximum size within one week. artemis variants (mutants h a and h d) were crystallised using the sitting drop vapour diffusion method by mixing nl protein with nl crystallisation solution comprising . m sodium citrate ph . , % peg , while the d a was crystalised in . m ammonium acetate, . m bis-tris ph . , % peg . all artemis variants were crystalised in the presence of nl of crystal seed solution obtained from previous crystallisation experiment. crystals grew after one day at °c. and reached maximum size within one week data collection and refinements data were collected at diamond light source i , i , or i beamlines. diffraction data were processed using dials [ ] and structures were solved by molecular replacement using phaser [ ] and the pdb coordinates q a. model building and the addition of water molecules were performed in coot [ ] and structures refined using refmac [ ]. data collection and refinement statistics are given in table i. the x-ray fluorescence data was collected at diamond light source i ( tt ) using % transmission and . ev, and i ( af ) using % transmission and . ev (suppl. figure ). generation of -radiolabelled substrates pmol of single-stranded dna (eurofins mwg operon, germany) were labelled with . pmol of α- p-datp (perkin elmer) by incubation with terminal deoxynucleotidyl transferase (tdt, u; thermofisher scientific), at oc for hour. this solution was then passed through a p micro bio-spin chromatography column (biorad), and the radiolabeled dna was annealed with the appropriate unlabeled oligonucleotides ( : . molar ratio of labelled to unlabeled oligonucleotide) (supplementary table for sequences) by heating to oc for min, and cooling to below oc in annealing buffer ( mm tris-hcl; ph . , mm nacl, . mm edta). gel-based nuclease assays standard nuclease assays were carried out in reactions containing mm hepes-koh, ph . , mm kcl, mm mgcl , . % (v/v) triton x- , % (v/v) glycerol (final volume: .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / μl), and the indicated concentrations of artemis. reactions were started by the addition of dna substrate ( nm), incubated at °c for the indicated time, then quenched by addition of μl stop solution ( % formamide, mm edta, . % (v/v) xylene cyanole, . % (v/v) bromophenol blue) with incubation at °c for min. reaction products were analysed by % denaturing polyacrylamide gel electrophoresis (made from % solution of : acrylamide:bis-acrylamide, biorad) and m urea (sigma aldrich)) in x tbe (tris-borate edta) buffer. electrophoresis was carried out at v for minutes; gels were subsequently fixed for minutes in a % methanol, % acetic acid solution, and dried at °c for two hours under a vacuum. dried gels were exposed to a kodak phosphor imager screen and scanned using a typhoon instrument (ge). fluorescence-based nuclease assay. the protocol of lee et al [ ] was adapted for structure-specific endonuclease activity. a ssdna substrate was utilised containing a ’ fitc-conjugated t and a ’ bhq- (black hole quencher)-conjugated t (suppl. table ). as the fitc and bhq- are located proximal to one another, prior to endonucleolytic incision, the intact substrate does not fluoresce. following endonucleolytic incision by dclre c/artemis, there is uncoupling of the fitc from the bhq- and an increase in fluorescence. inhibitors (at increasing concentrations) were incubated with artemis for minutes at room temperature, before the reaction was started with the addition of dna substrate. assays were carried out in a -well format, in a l reaction volume. the buffer was the same as for the gel-based nuclease assays, artemis concentration was nm, and the dna substrate was at nm. fluorescence spectra were measured using a pherastar fsx (excitation: nm; emission: nm) with readings taken every sec, for min, at °c. results human artemis (snm c or dclre c) has a core catalytic fold similar to snm a and apollo/snm b the core catalytic domain of artemis (aa – ) was produced in baculovirus-infected sf cells fused to a highly basic his -zb tag, which confers tight binding to cation exchange columns. the protein was purified using immobilised metal affinity chromatography (imac) on a nickel-sepharose column as the initial step. subsequent preparations were performed without the use of imac, to avoid the introduction of ni + ions during purification. artemis protein was purified as detailed in the methods & materials, and crystals were subsequently grown and diffracted to . Å resolution (table ); the structure was solved using a structure of snm a (pdb coordinates q a) as the molecular replacement model. the resultant artemis structure (pdb coordinates af ) contains a single molecule in the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / asymmetric unit, with two zinc ions coordinated at the active site. the metal ions were identified using x-ray fluorescence (xrf) analysis during data collection at the diamond light source. when using the protein purified using imac, the first zinc ion in the active site can be replaced by a nickel ion (pdb: tt ) (figure e). the presence of the nickel ion was also confirmed in the crystal by xrf. the x-ray fluorescence analysis of the metal ions present in the structures are shown in suppl. figure . this metal ion coordination pattern has been observed with other member of the family, such as snm a and snm b/apollo [ , ]. the overall fold of human artemis protein catalytic core is very similar to that of human snm a and snm b/apollo ( Å rmsd). it has the key structural characteristics of human mbl fold nucleases, with the di-metal containing active site interfaced between the mbl and β-casp domains (figure a). as anticipated, the mbl domain (figure a and b, in pink) of artemis has the typical α/β-β/α sandwich mbl fold [ ] and contains all of the highly conserved motifs – (figure c and d; and sequence alignment, suppl. figure ) which are typical for the whole mbl superfamily, and motif a–c which are typical of the β-casp fold containing family [ , , , ]. motifs – (figures d and c) are responsible for metal ion coordination in both dna and rna processing mbl enzymes [ ]. as previously observed in crystal structures of human snm a and apollo/snm b, artemis can coordinate one or two metal ions in its active site. one zinc ion (zn ) in the active site is coordinated by four residues (his , his , his , and asp ) and two water molecules (h o and ) in an octahedral manner (figure c). the second zinc ion (zn ) was refined with % occupancy and is coordinated by three residues (asp , his , and asp ) and two water molecules (h o and ). the low occupancy of the second zinc ion, together with the two conformations ( . occupancy for each conformation) observed for asp (figure e) suggest that this site binds a metal ion less tightly than the zn site, consistent with studies on other human mbl fold nucleases [ , baddock et. al., ] the structure of human snm a (pdb: ahr) [ ] was solved with a single zinc ion coordinated in the active site (figure a). by contrast, snm b structures solved with a bound amp (baddock et. al., accompanying paper) showed that both metal ions are positioned to coordinate the phosphate group of the amp in an octahedral manner (figure b). in summary, the octahedral coordination sites for the first zinc ion are contributed to by three histidines, one aspartate residue, and either water molecules or a phosphate oxygen of the substrate; the second metal ion is more weakly coordinated in the snm protein family, with one histidine and two aspartates, with the remaining three positions occupied by water or a phosphate oxygen of the dna substrate. this can explain the partial occupancy of the second zinc position in snm enzyme structures, where the full occupancy may be achieved only in presence of substrate. we propose that artemis would .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / coordinate a phosphate group of its substrate in a similar manner. the structure of human cpsf- (pdb: i t), an rna processing nuclease, with two active site bound zinc ions and a phosphate molecule [ ], shows that the two zinc ions are coordinated in a very similar geometry with the human mbl dna processing enzymes. [ , ]. a striking difference between the mbl rna and dna nucleases is that the second metal ion (m ) in the rna processing nucleases is coordinated by an additional histidine residue (his for cpsf- ) [ ] that is absent in the dna processing enzymes (figure d). the structure of artemis reveals a novel zinc-finger like motif in the β-casp domain. proteins with a β-casp fold form a distinct sub-group within the mbl-fold superfamily that specifically act on nucleic acids [ ]. artemis’ β-casp domain is comprised of residues – and it is the second globular domain (figure a, shown in white) in the catalytic region; inserted within the artemis mbl fold sequence between the small α-helices and (figure b). the β-casp domain has been proposed to facilitate substrate recognition and binding in the nucleic acid processing mbl fold-containing family of enzymes [ , ]. another metal ion coordination site, unique to artemis, is present in the β-casp domain, with similarity to the canonical cys his zinc-finger motif [ , ]. many dna binding proteins, including transcription factors and a substantial number of dna repair factors (including those involved in nhej), possess the classical cys his zinc finger motif, that serves as a structural feature stabilising the dna binding domain [ – ]. a typical cys his zinc coordinating finger (figure a) has a ββα motif, wherein the zinc ion is coordinated between an α-helix and two antiparallel β-sheets. the zinc ion confers structural stability and hydrophobic residues located at the sides of the zinc coordination site enable specific binding of the zinc finger in the major groove of the dna [ , , , ]. similar to the canonical cys his zinc finger motif, the zinc ion coordination in artemis’ β-casp domain adopts a tetrahedral geometry, with coordination by two cysteine (cys and cys ) and two histidine (his and his ) residues (figure b). however, in the case of artemis the metal ion coordination site is sandwiched between two β-sheets instead of an α-helix and two antiparallel β-sheets. almost all the residues in the zinc-finger like motif (his , cys , and cys ) are unique to artemis (sequence alignment suppl. figure ), with only his being well conserved within the snm -family. however, these four residues that forms the zinc-finger like motif are highly conserved in artemis across different species (from human to marine sponge), implying functional importance (sequence alignment suppl. figure ). consistently, substitution of his and his (h n and h l), two of the zinc coordinating residues in the β-casp domain of artemis, cause rs-scid in humans [ , , ]. patients with these inherited mutations suffer from impaired v(d)j recombination, leading to underdeveloped b and t lymphocytes. the importance of histidine , has been highlighted by de villartay et al. [ ], who showed that the full- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / length h a artemis variant is unable to carry out v(d)j recombination in vivo and has no discernible endonucleolytic activity in vitro. comparison of artemis structure with wo and wnl during the preparation of this manuscript a structural study on the catalytic core of artemis was published [ ]. this study described reported two artemis structures (pdb: wo and wnl) that are similar to our artemis structure (pdb code af ) (backbone rmsds of . Å and . Å respectively), with identical relative positioning of the mbl and β-casp domains (figure a and suppl. figure a). the only significant difference was that whilst we refined our structure with two zinc ions in the active site, both of the crystal forms reported by karim et al. were modelled with a single active site zinc ion (zn ), reinforcing the proposal of weaker metal ion binding at the zn site. re-analysis of the wo and wnl structures an unusual aspect of the karim et al. structures is that both crystal forms were obtained in the presence of dna and were reported to require dna for their growth; the crystals showed a fluorescence signal supporting the presence of dna (the oligonucleotides used contained a cyanine dye fluorophore), yet neither of the models presented contain dna. the authors referred to some broken stacking electron density in wnl in a solvent channel and a patch of unsolved density approaching the active site in wo , but state that the dna “did not bind to the protein in a physiological way, and likely bound promiscuously to promote crystallization” [ ]. we performed a careful re-examination of these structures looking closely at the residual electron density. for the wnl structure we were able to locate a distorted duplex dna of around base pairs which we propose may be the product of duplex annealing of the oligonucleotide used for crystallization (a semi-palindromic -mer that was designed to form a hairpin with phospho-thioate linkages in the single-stranded region) (suppl. figure b). for this structure, we are in general agreement with karim et al that the dna does not appear to make meaningful interactions with the protein that inform on the mechanism of nuclease activity, although this mode of association with dna may possibly be relevant to alternative binding modes relating to higher order complexes containing artemis. by contrast, for the wo structure we were able to confidently build a dna molecule that contacts the artemis active site in a manner that we believe to be relevant to the artemis nuclease activity. our model contains an -nucleotide -single-stranded extension with a short -base pair region of duplex dna that reaches into the artemis active site making close contacts with the metal ion centre in a manner consistent with the proposed catalytic mechanism (figure b). the sequence of the longest strand corresponds to the -nucleotide cy- labelled strand (cy -gcgatcagct) with some residual density at the -end that may be attributed to the cyanine fluorophore which we did not include in our model. the complementary .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / strand used for crystallization was -nucleotides long and was intended to produce a - overhang, but only two bases and three phosphates could be located in the density. the abrupt manner in which the electron density apparently disappears from either end of this strand suggests that this is the product of a cleavage reaction, although it is possible that remaining nucleotides are not located due to disorder. the analysis of electron density at this site is complicated by the proximity to a crystallographic -fold symmetry axis, which brings a symmetry copy of the dna molecule into a position where atoms partially overlap and the extended ’ strands form a pseudo duplex (suppl. figure a). the occupancy of the entire dna molecule is thus limited to . , and the lower occupancy is reflected in the electron density map which requires a lower contour level than would usually be applied (suppl. figure b). after carefully building and refining the afore-described dna bound model, significant positive electron density was revealed for the second metal ion (zn site) which we also included in the model with the same occupancy ( . ) as the dna. our model was refined to similar crystallographic r- factors as wo and has been deposited with pdb accession number abs (refinement statistics are given in table i). model for artemis dna binding using the crystallographically observed dna as a template we were able make a model for artemis binding to a longer section of double-stranded dna by complementing unpaired bases on the single-stranded dna overhang with canonical base pairs, whilst maintaining acceptable geometry of the sugar phosphate backbone (figure a). the duplex section of this model deviates slightly from the ideal b-form geometry [ ], in a manner that is reminiscent of certain transcription factor dna complexes [ , ]. we have also extended the metal ion contacting strand by three nucleotides to form a -overhang; the positioning of the overhang nucleotides is more speculative, nevertheless it was possible to avoid clashes with protein residues whist maintaining relaxed geometry (figure c). in the extended dna complex model, artemis contacts both strands of the dna model in several areas; notably a single phosphate lies above the di-metal ion bearing active site and ligates to both metal ions in the same manner as observed in structures of related enzymes with phosphate or phosphate-containing compounds (baddock et.al. ) [ ]. the two downstream nucleotides on this strand pass close to the protein surface, forming possible interactions with both the main chain of asp and sidechain of lys , whilst subsequent nucleotides are not close to the protein (figure a). the overhang portion of this strand continues with a slightly altered trajectory, potentially contacting artemis in the vicinity of the cleft separating the mbl and β-casp domains, with the potential to form favourable interactions with both positively charged (arg ) and aromatic residues (phe and trp ) (figure c). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / the complementary strand forms interactions with the protein via backbone contacts that span a -nucleotide stretch between - and -bases from the -terminus that contact positively charged sidechains in the mbl domain (lys , lys , arg , and lys ) (figure a). the -end of this strand apparently terminates directly above a cluster of polar or positively charged residues in the β-casp domain (lys , lys , asn ) (figure b). whilst the experimental (as used in co-crystallisation) dna substrate and our model both contain a '-hydroxyl group, the model implies that addition of a '-phosphate could be accommodated and may be expected to make favourable interactions with the basic cluster of residues. thus, our model illustrates a preferred binding mode for artemis for dna with a '-overhang binding at the junction between double- and single-stranded regions, and the expected product of this reaction would be a blunt ended dna with a '-phosphate. in the case of hairpin dna substrates our model indicates the possibility for artemis to accommodate a loop connecting the two strands possibly of around -nucleotides or more, with the cleavage product being dna with a '-overhang cleaved from the last paired base of the duplex. comparison of the artemis dna binding mode with that of other nucleases we have recently determined the structure of snm b/apollo in complex with two deoxyadenosine monophosphate nucleotides and through a similar process of extrapolation to that outlined above we have independently built a model for snm b binding to dna containing a '-overhang (one of its preferred substrates) (baddock et.al. accompanying paper). the overall mode of dna binding is similar in the two models (figure ), with the two dna duplexes being roughly parallel and forming contacts to similar regions on the mbl domain. the most important differences lie in the nature of the contacts formed to the active site and the paths of the various overhangs. in the snm b model extensive contacts are made to the -phosphate in a well-defined phosphate binding pocket. both human snm a and snm b are exclusively -phosphate exonucleases, with most of these phosphate binding residues being highly conserved (sequence alignment suppl. figure in yellow) (baddock et. al accompanying paper). interestingly, artemis lacks these key phosphate binding residues and the -phosphate binding pocket of snm a and snm b. instead, this pocket in artemis is partially filled by the side chain of phe . these contacts appear to define a high-affinity binding pocket exclusively for the -phosphate of the dna terminus, thus explaining the major differences in nuclease activities within the family, i.e., snm a and snm b being exonucleases and artemis being an endonuclease. further differences between artemis and snm b/snm a are found in the loop connecting β-strands g and h (using artemis numbering), which in artemis is significantly longer and occupies a different position contacting residues in the mbl domain (figure b) , compared to the loops in snm b and snm a that form part of the phosphate binding pocket and make potential contacts with the '-overhang. this loop displacement in artemis may contribute to its ability in accommodating dna substrates .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / with either -overhangs or hairpins, thus facilitating its function as a structure specific endonuclease. interestingly, the surface of the artemis interface between the mbl and β-casp domains contains a belt of positively charged residues (figure a). these positive surface charges are proposed to facilitate productive dna binding to the active site, consistent with our dna binding model. one of the striking differences, at least in the available structures, is that the active site of artemis is more open compared to those of snm a or snm b. this openness may reflect an ability to accommodate different substrate conformation including hairpins, '- and '-overhangs, as well as dna flaps and gaps. both human snm a and snm b appear to have a more sequestered active site that would only fit a single strand of dna, which is consistent with previous findings on their preferred substrate selectivity [ ]. biochemical characterisation of truncated artemis catalytic domain (aa - ) to investigate the activity of our different versions of recombinant artemis, we performed nuclease assays using radiolabelled dna substrates. we compared the catalytic domain purified using imac (which contained ni + in the active site) with protein purified using ion exchange (and avoiding imac), which contained predominantly zn +. we also tested the activity of full-length phosphorylated artemis. the results show that both truncated enzymes have identical activities, which is also very similar to that of the full- length enzyme (suppl. figure ). one notable difference between our full-length protein and that reported by ma et. al [ ], is that our full-length protein is active in the absence of dna-pkcs. intact protein mass spectrometric analysis of our full-length protein shows that the protein has undergone up to five phosphorylation events (suppl. figure ). poinsignon et al. have shown that artemis is constitutively phosphorylated in cultured mammalian cells and is the target of additional phosphorylation in response to induced dna damage [ ]; it is interesting that the capacity to phosphorylate artemis to produce an active form is also conserved in insect cells. we observed exonuclease activity with full-length artemis at nm (suppl. figure ), though this was weak compared to its endonuclease activity at the same concentration. we observed no exonuclease activity for the truncated artemis construct, it is possible that phosphorylation alters the balance between endonuclease and exonuclease activity, though the biological relevance of this, if any, remains to be validated. as mentioned above, both human snm a and snm b require a '-phosphate for their activity [ , , ]. to investigate whether there is a similar requirement for artemis, we tested the activity of truncated artemis against single-stranded and overhang dna substrates with different '-end groups, including a phosphate, hydroxyl group, and biotin groups (figure a). the results imply that, at least under the tested conditions, artemis is .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / agnostic to the different end modifications, exhibiting comparable digestion of all substrates. extensive evidence demonstrates that full-length artemis in complex with dna-pkcs has structure specific endonuclease activity [ , , ]. these studies reported that artemis can digest substrates including overhangs, hairpins, stem-loops, and splayed arms (pseudo-y). to investigate the activity of truncated artemis catalytic domain (aa – ) we performed nuclease assays using a variety of radio-labelled dna substrates (figure b). the results show that truncated artemis has substrate specific endonuclease activity, with a preference for single-stranded dna susbstrates, and those that contain single stranded character (e.g. ’- and ’-overhangs, splayed arms, and a lagging flap structure), compared with double stranded dna structures (e.g. ds dna and a replication fork). this is in accordance with previous research, where artemis has been reported to cleave around ss- to dsdna junctions in dna substrates (perhaps cite chang et al, for this). the truncated artemis catalytic domain also exhibits hairpin opening activity, in accordance with what has previously been reported (suppl. fig. ). on a duplex substrate (ym from ma et al) [ ] with a nt hairpin region, artemis cleaves adjacent to the hairpin, consistent with previous data. it is clear that truncated form of artemis exhibits nuclease activity closely comparable to the phosphorylated full-length artemis protein [ ], indicating that the structural studies presented here reveal mechanistic insights of direct relevance to the dna-pkcs-associated form of artemis that engages in end-processing reactions in vivo. structural and biochemical characterisation of artemis point mutations previous site-directed mutagenesis studies by pannicke et al. targeting the metal ion coordinating residues in the active site (d n, h a, h a, d n) of full-length artemis (aa – ) established the importance of active site motifs – for activity [ ]. each of these substitutions markedly reduced or abolished artemis’ ability to carry out its role in v(d)j recombination in vitro. we mutated, expressed, purified, and crystallised three forms of truncated artemis (aa – ) with substitutions in several of these metal ion co-ordinating residues, i.e. d a, h a, and the omenn syndrome patient mutation h d [ , ]. we found that the overall architecture of the three variants is almost identical to the wt (figure a). the d a structure retains one zn ion (zn ) whilst losing the second, (zn ) (figure b) in the active site. both the h a and h d variants additionally exhibited loss of the zn ion. all three variants retained the zn ion in the zinc finger-like motif of the β-casp domain. the position of the active site residues and the surrounding residues in the d a variant, superposed perfectly with wt artemis. as previously mentioned, asp can adopt two conformations as seen in tt structure, noting that the coordinated zinc ion is generally present at about % occupancy in both of the wt artemis structures (pdb tt and af ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / therefore, it is unsurprising that mutation of asp to alanine results in the loss of zn . differential scanning fluorimetry (dsf) experiments were carried out to investigate the stability of the three variants. dsf analysis showed that the d a variant has similar thermal stability as the wt, suggesting that the protein is stable and folded in the presence of a single metal ion in the active site, whilst both h a and h d variants were substantially destabilized with Δtm around - °c compared to the wt artemis (suppl. figure ). histidines and are the first two histidine residues in the hxhxdh motif (motif ) in the mbl domain. their role is to coordinate the first metal ion (zn site) in the catalytic site. in the absence of metal ions in the catalytic site, the loop comprising residues – moves away from the active site (figure c and d). another small rearrangement occurs in helix α (residues – ) of the mbl domain. in both the h a and h d variants, helix α moves slightly closer toward strand β , compared to the wt and d a variant. surprisingly, the biggest rearrangement occurs in β-strand e (residues – ) and α-helix e (residues – ); both located near the zinc finger motif in the β-casp domain (figure a). in h a and h d variants, both β-strand e and α-helix e shifted upward and away from the zinc finger like motif. these conformational changes may suggest some allosteric regulation in terms of substrate binding and catalytic activity of the enzyme. we also tested the activity of the d a, h a, and h d artemis variants in vitro using single-stranded ’ end radiolabelled dna as a substrate in a gel-based assay. all three variants lost their ability to digest the dna substrate in vitro (figure ). these observations are in agreement with the results obtained with full-length variants by ege et al. and pannicke et al. [ , ]. their studies show that the full-length artemis variants h a, h d, and d n are able to interact with and be phosphorylated by dna-pkcs, however, have lost the ability to digest dna substrates in vitro. the combined results reveal the importance of the hxhxdh motif and highlight the importance of the di-metal catalytic core in the snm family, not only in directly catalysing hydrolysis, but also likely in conformational changes involved in catalysis [ ]. identification of small molecule inhibitors of artemis radiotherapy is a mainstay of cancer therapy; its effectiveness relies on inducing dna double-strand breaks (dsbs) that contain complex, chemically modified ends that must be processed prior to repair [ , ]. the canonical non-homologous end-joining (c-nhej) pathway repairs  % of dna double-strand breaks in mammalian cells [ , ]. therefore, combining radiation therapy in conjunction with c-nhej inhibitors could selectively radiosensitise tumours. weterings et al. have reported a compound that interferes with the binding of ku / to dna, thereby increasing sensitivity to ionising radiation in human cell lines [ ]; atm inhibitors are also in advanced clinical development and represent the most developed strategy to inhibit dsb repair to increase the efficacy of radiotherapy [ ] .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / artemis, along with snm a and snm b, possess a conserved mbl-fold domain that is similar to the true bacterial mbls. previous studies on human snm a and snm b/apollo, showed that ceftriaxone (rocephin), a widely used β-lactam antibacterial (third generation cephalosporin) inhibits the nuclease activity of both snm a and snm b [ ]. to investigate if this class of β-lactam anti-bacterial compounds could inhibit artemis we performed fluorescence-based nuclease assays with three cephalosporins, i.e. ceftriaxone, cefotaxime and -aminocephalosporanic acid (figure a). the results show that neither cefotaxime nor the parent compound, -aminocephalosporanic acid, potently inhibit artemis’ activity, whilst ceftriaxone inhibits artemis with a modest ic of µm (figure b). we solved the structure of ceftriaxone bound to the catalytic domain of artemis (purified by imac) at . Å resolution (figure c) by soaking an artemis crystal with ceftriaxone. this structure was solved by molecular replacement (using pdb: tt as a model), in the space group p with one protein molecule in the asymmetric unit. as before, in this structure artemis possesses the canonical bilobar mbl and β-casp fold with an active site containing one nickel ion, possibly related to the purification method. ceftriaxone binds to the protein surface in an extended manner making interactions with the active site, towards the β-casp domain (figure c). there is no evidence for cleavage of the β-lactam ring nor of loss of the c- ’ cephalosporin side chain, reactions that can occur during ‘true’ mbl catalysed cephalosporin hydrolysis. the electron density at the active site clearly reveals the presence of the ceftriaxone side chain in a position to coordinate the nickel ion (at the zn site) replacing water molecules (waters and ) compared with the apo structure (figure d). despite the conservation of key elements of the active site of the mbl fold nucleases and the ‘true’ b-lactam hydrolysing mbls [ ], ceftriaxone, does not interact with the nickel ion via its b-lactam ring (as occurs for the true mbls), but via both carbonyl oxygens of the cyclic , diamide in its sidechain (ni-o distances: Å and . Å), i.e. it is not positioned for productive b-lactam hydrolysis. the amino-thiazole group (n ) of ceftriaxone forms hydrogen bonds with the side chain of asn , while the s of the -aminocephalosporanic acid core of the compound interacts with the hydroxyl of tyr through an ethylene glycol molecule. the rest of the molecule appears to be flexible. the binding mode of ceftriaxone to artemis shown in figure c is near identical to that observed for ceftriaxone with snm a (pdb: nzw) structure (suppl. figure ). one notable difference between the ceftriaxone-bound artemis structure and the apo structure, is the loss of a second metal ion at the active site (figure d). in the apo structure (figure c), this zinc ion is coordinated by residues asp , his , and asp . with a single metal coordination in the ceftriaxone bound structure, the asp side chain is positioned away from the active site (figure d), as seen in the nickel bound (pdb: tt ) structure (figure e). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / to investigate the possibility of inhibiting artemis through binding to the zinc finger motif in the β-casp domain, we used the fluorescence-based nuclease assay to test three compounds known to react with thiol groups present in zinc fingers and which result in zinc ion displacement, i.e. ebselen [ ], auranofin [ ] and disulfiram [ ]. we found that both ebselen and disulfiram inhibit artemis with ic values around . µm and . µm respectively, whilst auranofin inhibits less potently (ic µm) (suppl. figure ), indicating additional possible inhibitory strategies. discussion the dclre c/artemis gene was first discovered in , following work with children with severe combined immunodeficiency disease (scid) [ ]. subsequent studies have shown that artemis is a key enzyme in v(d)j recombination [ , , ] and the c-nhej dna repair pathway [ , , ]; and that it is structure specific endonuclease, and member of mbl fold structural superfamily [ ]. our structures of wild-type and catalytic site mutants of snm c/dclre c or artemis protein show that, like snm a and snm b/apollo and the rna processing enzyme cpsf , artemis has a typical α/β-β/α sandwich fold in its mbl domain and has a β-casp domain, the latter a characteristic feature of mbl fold nucleases. however, both our artemis structures and those recently reported by karim et. al [ ] reveal a unique structural feature of artemis in its β-casp domain that is not reported in other human mbl enzymes, i.e. a classical zinc-finger like motif. moreover, collectively, these structures allow us to assign a likely mode of dna substrate interaction for artemis. the role of the newly-described zinc-finger like motif remains unknown. however, zinc- finger motifs are common structural features in dna binding proteins such as transcription factors [ , ], but are also observed in a substantial number of required and accessory nhej proteins [ ]. these zinc fingers provide structural stability and enhance substrate selectivity rather than being involved in catalytic reactions, and we propose that this is likely to be the case for artemis. the fact that the residues (his , his , cys , and cys ) that are involved in the zinc-finger like motif are highly conserved across different artemis species suggests the importance of this structural feature. furthermore, point mutations in his and his (h n and h l) have been reported in patients with a scid phenotype [ ] . the presence of one or two metal ions coordinated by the hxhxdh motif at the active site of artemis reflects a hallmark of the snm enzyme family [ , ]; the available evidence implies that metal ion binding at one site (zn site in standard mbl nomenclature) is stronger than at the other (zn site). by analogy with studies on the true mbls, these metal ions are proposed to activate a water molecule that act as the nucleophile for the phosphodiester cleavage. our structure (pdb: af ) suggests that the native metal ion(s) residing in the active site of artemis is zinc, although a nickel ion can also occupy the same .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / site depending on the how the protein was purified (pdb: tt ). neither the presence of ni ion in the active site, nor the truncation of the c-terminal tail appear to inhibit, at least substantially, the activity of artemis. thus, using radio-labelled gel-based nuclease assays, we showed that the truncated artemis catalytic domain (aa – ) with either zn or ni ions in the active site (as observed crystallographically in the same preparations) have similar activity with the full-length artemis construct (aa - ). therefore, it seems likely that nickel ions are able to replace zinc ions in solution, but catalysis of mbl fold enzymes, including hydrolytic reactions, with metal ions other than zinc is well-precedented [ , ] we also solved structures of three artemis catalytic mutants; d a, h a, and an omenn syndrome patient mutation, h d. using gel-based nuclease assays, we showed that these variants are biochemically inactive. overall, the three variant structures are similar to the wt structure, even though h a and h d entirely lack any metal ions in the active site, although zinc was present in the zinc finger. mutation of asp to alanine results in the loss of the second metal in the catalytic site, likely explaining the loss of activity, although the first metal ion is still present. note that some mbl fold hydrolases uses two metal ions (e.g.,b and b subfamilies of the true mbls and rnase j from bacillus subtilis) (suppl. figure a and c) [ , ] whereas others, sometimes with apparently very similar active sites, only use one metal ion (e.g. the b subfamily of the true mbls and rnase j from staphylococcus epidermis) [ , ](suppl. figure b and d). thus, whilst our results support the importance of having both metals for the nuclease activity by artemis, subtle features can influence mbl fold enzyme activity [ , ]. following re-analysis of the karim et.al structure (pdb code wo ), we were able to generate a model of a dna overhang i. complex with artemis that informs on the substrate binding mode. our model shows that artemis interacts with the dna substrate in the interface between the mbl and the β-casp domains. this interaction is mediated through the combination of polar or positive residues and aromatic residues of artemis and the dna substrate (figure ). artemis is the only identified mbl/β-casp dna processing enzyme that possesses substantive endonuclease activity. by contrast both snm a and snm b/apollo are strictly phosphate exonucleases [ , , ]. the recent structure of snm b/apollo in complex with-׳ two deoxyadenosine monophosphate nucleotides (pdb code: a f) reported by baddock et. al. , reveals a cluster of residues that form a ׳-phosphate binding pocket, adjacent to the metal centre. structural sequence alignments of the three proteins shows that these residues are highly conserved in snm a and snm b (suppl. figure ). apart from ser , none of these conserved phosphate binding pocket residues are present in artemis. instead, the pocket is partially occupied by the phe side chain, which is absent in both snm a and snm b. artemis also possesses a longer and more flexible loop connecting β- strands and (figure b), compared to the same loop in snm a and snm b that make .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / up part of the ׳-phosphate binding pocket. the flexibility of this loop could enable accommodation with different types of dna structures, such as hairpins, and ׳-overhangs. these differences plausibly explain artemis’ substrate preferences and its primary activity as a structure-selective endonuclease. of the three of the β-lactam anti bacterial compounds previously shown to inhibit snm a [ ], we only observed inhibition of artemis with ceftriaxone. although the potency of inhibition is moderate (ic µm), we were able to solve the structure of ceftriaxone in complex with artemis. notably, ceftriaxone does not bind with its b-lactam carbonyl located at the active site where it ligates to one zinc (or other metal) ion, but instead binds the single nickel ion in bidentate manner via the carbonyls of its cyclic , diamide on its c- ’ sidechain. [ , ]. studies with the true mbls have shown that appropriate derivatisation of weakly binding molecules can lead to highly potent and selective inhibitors. in proof of principle attempts to inhibit artemis though its novel structural feature compared to other mbl fold nucleases, i.e. via its zinc-finger like motif, we tested three covalent inhibitors with thiol-reactive groups. ebselen, disulfiram and auranofin have the potential to interact with zinc fingers, including via zinc ejection with consequent protein destabilization [ , , , ]. both ebselen and auranofin are reported have some antimicrobial properties [ ], ebselen is in clinical trials for a variety of conditions, ranging from stroke to bipolar disorder [ ], and auranofin is used for treatment of rheumatoid arthritis [ ]. recent studies have also shown that ebselen inhibits enzymes from sars- cov- , i.e. the main protease (mpro) and the exonuclease exon (nsp exon-nsp ) complex [ , ]. disulfiram is a known acetaldehyde dehydrogenase inhibitor used in treatment for alcohol abuse disorder [ ]. our results show that both ebselen and disulfiram inhibit artemis (ic s . µm and . µm, respectively), whilst auranofin is less potent (ic µm). studies focussed on inhibiting the mbl fold nucleases are at an early stage compared with work on the true mbls. the structures and assays results presented here provide starting points with established drugs, from which it might be possible to generate selective artemis inhibitors, either binding at the active site or elsewhere (including the apparently unique zinc finger of artemis), in order to radiosensitise cells. accession numbers coordinates and structure factors have been deposited in the protein data bank under accession codes tt , af , afs, afu, agi, apv and abs. supplementary data .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supplementary data are available at nar online. acknowledgement we are very grateful to dr rod chalk and tiago moreira for mass spectrometry, and dr neil patterson for the helpful discussions and data collections at diamond light source. we acknowledge diamond light source for time on beamlines i , i and i under proposal mx . funding this work was supported by a cancer research uk programme award [a to pjm, og and cjs] and wellcome trust grant [ /zz /z to og and /z/ /z to cjs ]. conflict of interest the authors declare no conflict of interest. references . yang w ( ) nucleases: diversity of structure, function and mechanism. . callebaut i, moshous d, mornon j-p & de villartay jp ( ) metallo-beta-lactamase fold within nucleic acids processing enzymes: the beta-casp family. nucleic acids res. , – . . allerston ck, lee sy, newman ja, schofield cj, mchugh pj & gileadi o ( ) the structures of the snm a and snm b/apollo nuclease domains reveal a potential basis for their distinct dna processing activities. nucleic acids res. , – . . goodarzi aa, yu y, riballo e, douglas p, walker sa, ye r, härer c, marchetti c, morrice n, jeggo pa & lees-miller sp ( ) dna-pk autophosphorylation facilitates artemis endonuclease activity. embo j. , – . . malu s, de ioannes p, kozlov m, greene m, francis d, hanna m, pena j, escalante cr, kurosawa a, erdjument-bromage h, tempst p, adachi n, vezzoni p, villa a, aggarwal ak & cortes p ( ) artemis c-terminal region facilitates v(d)j recombination through its interactions with dna ligase iv and dna-pkcs. j. exp. med. , – . . niewolik d, pannicke u, lu h, ma y, wang lcv, kulesza p, zandi e, lieber mr & schwarz k ( ) dna-pkcs dependence of artemis endonucleolytic activity, differences between hairpins and ′ or ′ overhangs. j. biol. chem. , – . . niewolik d, peter i, butscher c & schwarz k ( ) autoinhibition of the nuclease artemis is mediated by a physical interaction between its catalytic and c-terminal domains. j. biol. chem. , – . . li s, chang hh, niewolik d, hedrick mp, pinkerton ab, hassig ca, schwarz k & lieber mr ( ) evidence that the dna endonuclease artemis also has intrinsic ′-exonuclease .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / activity. j. biol. chem. , – . . baddock ht, yosaatmadja y, newman ja, schofield cj, gileadi o & mchugh pj ( ) the snm a dna repair nuclease. dna repair (amst). , . . yan y, akhter s, zhang x & legerski r ( ) the multifunctional snm gene family: not just nucleases. future oncol. , – . . wang at, sengerova b, cattell e, inagawa t, hartley jm, kiakos k, burgess-brown na, p sl, h ej, schofield cj, gileadi o, hartley ja & mchugh pj ( ad) human snm a and xpf– ercc collaborate to initiate dna interstrand cross-link repair. genes dev. , – . . lenain c, bauwens s, amiard s, brunori m, giraud-panis mj & gilson e ( ) the apollo ′ exonuclease functions together with trf to protect telomeres from dna repair. curr. biol. , – . . van overbeek m & de lange t ( ) apollo, an artemis-related nuclease, interacts with trf and protects human telomeres in s phase. curr. biol. , – . . demuth i, digweed m & concannon p ( ) human snm b is required for normal cellular response to both dna interstrand crosslink-inducing agents and ionizing radiation. oncogene , – . . sengerová b, allerston ck, abu m, lee sy, hartley j, kiakos k, schofield cj, hartley ja, gileadi o & mchugh pj ( ) characterization of the human snm a and snm b/apollo dna repair exonucleases. j. biol. chem. , – . . mansilla-soto j & cortes p ( ) vdj recombination: artemis and its in vivo role in hairpin opening. j. exp. med. , – . . ma y, pannicke u, schwarz k & lieber mr ( ) hairpin opening and overhang processing by an artemis/dna-dependent protein kinase complex in nonhomologous end joining and v(d)j recombination. cell , – . . ma y, schwarz k & lieber mr ( ) the artemis:dna-pkcs endonuclease cleaves dna loops, flaps, and gaps. dna repair (amst). , – . . moshous d, callebaut i, de chasseval r, corneo b, cavazzana-calvo m, le deist f, tezcan i, sanal o, bertrand y, philippe n, fischer a & de villartay jp ( ) artemis, a novel dna double-strand break repair/v(d)j recombination protein, is mutated in human severe combined immune deficiency. cell , – . . lieber mr ( ) the mechanism of dsb repair by the nhej. annu. rev. biochem. , – . . gu j, li s, zhang x, wang lc, niewolik d, schwarz k, legerski rj, zandi e & lieber mr ( ) dna-pkcs regulates a single-stranded dna endonuclease activity of artemis. dna repair (amst). , – . . srivastava m & raghavan sc ( ) dna double-strand break repair inhibitors as cancer therapeutics. chem. biol. , – . . pannunzio nr, watanabe g & lieber mr ( ) nonhomologous dna end-joining for repair of dna double-strand breaks. j. biol. chem. , – . . shockett pe & schatz dg ( ) dna hairpin opening mediated by the rag and rag .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / proteins. mol. cell. biol. , – . . de p, peak mm & rodgers kk ( ) dna cleavage activity of the v(d)j recombination protein rag is autoregulated. mol. cell. biol. , – . . kim ms, lapkouski m, yang w & gellert m ( ) crystal structure of the v(d)j recombinase rag -rag . nature , – . . pannicke u, ma y, hopfner kp, niewolik d, lieber mr & schwarz k ( ) functional and biochemical dissection of the structure-specific nuclease artemis. embo j. , – . . barnes de, stamp g, rosewell i, denzel a & lindahl t ( ) targeted disruption of the gene encoding dna ligase iv leads to lethality in embryonic mice. curr. biol. , – . . roth db, menetski jp, nakajima p, bosma mj & gellert m ( ) v ( d ) j recombination : broken dna molecules with covalently sealed ( hairpin ) coding ends in scid mouse thymocytes. , – . . bassing ch, swat w & alt fw ( ) the mechanism and regulation of chromosomal v(d)j recombination. cell , – . . ege m, ma y, manfras b, kalwak k, lu h, lieber mr, schwarz k & pannicke u ( ) plenary paper omenn syndrome due to artemis mutations. blood , – . . volk t, pannicke u, reisli i, bulashevska a, ritter j, björkman a, schäffer aa, fliegauf m, sayar eh, salzer u, fisch p, pfeifer d, virgilio m di, cao h, yang f, zimmermann k, keles s, schindler d, hammarström l, caliskaner z, rizzi m, hummel m, pan-hammarström q, schwarz k & grimbacher b ( ) dclre c ( artemis ) mutations causing phenotypes ranging from atypical severe combined immunodeficiency to mere antibody deficiency. , – . . li l, moshous d, zhou y, wang j, xie g, salido e, hu d & cowan mj ( ) a founder mutation in artemis, an snm -like protein, causes scid in athabascan-speaking native americans. j. immunol. , – . . felgentreff k, lee yn, frugoni f, du l, van der burg m, giliani s, tezcan i, reisli i, mejstrikova e, de villartay jp, sleckman bp, manis j & notarangelo ld ( ) functional analysis of naturally occurring dclre c mutations and correlation with the clinical phenotype of artemis deficiency. j. allergy clin. immunol. , - .e . . savitsky p, bray j, cooper cdo, marsden bd, mahajan p, burgess-brown na & gileadi o ( ) high-throughput production of human proteins for crystallization: the sgc experience. j. struct. biol. , – . . dominy cn & andrews dw ( ) site-directed mutagenesis by inverse pcr. methods mol. biol. , – . . winter g, waterman dg, parkhurst jm, brewster as, gildea rj, gerstel m, fuentes- montero l, vollmar m, michels-clark t, young id, sauter nk & evans g ( ) dials: implementation and evaluation of a new integration package. acta crystallogr. sect. d struct. biol. , – . . mccoy aj, grosse-kunstleve rw, adams pd, winn md, storoni lc & read rj ( ) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / phaser crystallographic software. j. appl. crystallogr. , – . . emsley p & cowtan k ( ) coot: model-building tools for molecular graphics. acta crystallogr. sect. d biol. crystallogr. , – . . murshudov gn, skubák p, lebedev aa, pannu ns, steiner ra, nicholls ra, winn md, long f & vagin aa ( ) refmac for the refinement of macromolecular crystal structures. acta crystallogr. sect. d biol. crystallogr. , – . . lee sy, brem j, pettinati i, claridge tdw, gileadi o, schofield cj & mchugh pj ( ) cephalosporins inhibit human metallo β-lactamase fold dna repair nucleases snm a and snm b/apollo. chem. commun. , – . . carfi a, pares s, duée e, galleni m, duez c, frère jm & dideberg o ( ) the -d structure of a zinc metallo-β-lactamase from bacillus cereus reveals a new type of protein fold. embo j. , – . . li x & moses re ( ) the β-lactamase motif in snm is required for repair of dna double-strand breaks caused by interstrand crosslinks in s. cerevisiae. dna repair (amst). , – . . de villartay jp, shimazaki n, charbonnier jb, fischer a, mornon jp, lieber mr & callebaut i ( ) a histidine in the β-casp domain of artemis is critical for its full in vitro and in vivo functions. dna repair (amst). , – . . mandel cr, kaneko s, zhang h, gebauer d, vethantham v, manley jl & tong l ( ) polyadenylation factor cpsf- is the pre-mrna ′-end-processing endonuclease. nature , – . . alberts il, nadassy k & wodak sj ( ) analysis of zinc binding sites in protein crystal structures. protein sci. , – . . ataie nj, hoang qq, zahniser mpd, tu y, milne a, petsko ga & ringe d ( ) zinc coordination geometry and ligand binding affinity: the structural and kinetic analysis of the second-shell serine residue and the methionine residue of the aminopeptidase from vibrio proteolyticus †. biochemistry , – . . ishikawa h, nakagawa n, kuramitsu s & masui r ( ) crystal structure of ttha from thermus thermophilus hb , a rna degradation protein of the metallo- b - lactamase superfamily. , – . . matthews jm & sunde m ( ) zinc fingers - folds for many occasions. iubmb life , – . . wolfe sa, nekludova l & pabo co ( ) dna recognition by cys his zinc finger proteins. annu. rev. biophys. biomol. struct. , – . . krishna ss, majumdar i & grishin n v. ( ) structural classification of zinc fingers. nucleic acids res. , – . . singh jk & van attikum h ( ) dna double-strand break repair: putting zinc fingers on the sore spot. semin. cell dev. biol. . wu x, bishopric nh, discher dj, murphy bj & webster ka ( ) physical and functional sensitivity of zinc finger transcription factors to redox change. mol. cell. biol. , – . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . laity jh, lee bm & wright pe ( ) zinc finger proteins: new insights into structural and functional diversity. curr. opin. struct. biol. , – . . pannicke u, hönig m, schulze i, rohr j, heinz ga, braun s, janz i, rump em, seidel mg, matthes-martin s, soerensen j, greil j, stachel dk, belohradsky bh, albert mh, schulz a, ehl s, friedrich w & schwarz k ( ) the most frequent dclre c (artemis) mutations are based on homologous recombination events. hum. mutat. , – . . karim mf, liu s, laciak ar, volk l, rosenblum m, lieber mr, wu m, curtis r, huang n, carr g & zhu g ( ) structural analysis of the catalytic domain of artemis endonuclease/snm c reveals distinct structural features. j. biol. chem. , jbc.ra . . . ussery dw ( ) dna structure: a-, b- and z-dna helix families. encycl. life sci. . elrod-erickson m, rould ma, nekludova l & pabo co ( ) zif protein-dna complex refined at . Å: a model system for understanding zinc finger-dna interactions. structure , – . . locasale jw, napoli aa, chen s, berman hm & lawson cl ( ) signatures of protein- dna recognition in free dna binding sites. j. mol. biol. , – . . mandel cr, kaneko s, zhang h, gebauer d, vethantham v, manley jl & tong l ( ) polyadenylation factor cpsf- is the pre-mrna ’-end-processing endonuclease. nature , – . . poinsignon c, moshous d, callebaut i, de chasseval r, villey i & de villartay jp ( ) the metallo-β-lactamase/β-casp domain of artemis constitutes the catalytic core for v(d)j recombination. j. exp. med. , – . . jekimovs c, bolderson e, suraweera a, adams m, o’byrne kj & richard dj ( ) chemotherapeutic compounds targeting the dna double-strand break repair pathways: the good, the bad, and the promising. front. oncol. apr, – . . shibata a & jeggo p ( ) a historical reflection on our understanding of radiation- induced dna double strand break repair in somatic mammalian cells; interfacing the past with the present. int. j. radiat. biol. , – . . weterings e, gallegos ac, dominick ln, cooke ls, bartels tn, vagner j, matsunaga to & mahadevan d ( ) a novel small molecule inhibitor of the dna repair protein ku / . dna repair (amst). , – . . jin mh & oh dy ( ) atm in dna repair in cancer. pharmacol. ther. , . . lee sy, brem j, pettinati i, claridge tdw, gileadi o, schofield cj & mchugh pj ( ) cephalosporins inhibit human metallo β-lactamase fold dna repair nucleases snm a and snm b/apollo. chem. commun. , – . . palzkill t ( ) metallo-β-lactamase structure and function. ann. n. y. acad. sci. , – . . spraggon g, koesema e, scarselli m, malito e, biagini m, norais n, emolo c, barocchi ma, giusti f, hilleringmann m, rappuoli r, lesley s, covacci a, masignani v & ferlenghi i ( ) supramolecular organization of the repetitive backbone unit of the streptococcus pneumoniae pilus. plos one . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . abbehausen c ( ) zinc finger domains as therapeutic targets for metal-based compounds - an update. . . chen s, jeng k & lai mmc ( ) zinc finger-containing cellular transcription corepressor zbtb promotes influenza virus rna transcription and is a target for zinc ejector drugs. , – . . brandt vl & roth db ( ) artemis: guarding small children and, now, the genome. j. clin. invest. , – . . vázquez-torres a ( ) redox active thiol sensors of oxidative and nitrosative stress. antioxidants redox signal. , – . . mao z, bozzella m, seluanov a & gorbunova v ( ) comparison of nonhomologous end joining and homologous recombination in human cells. dna repair (amst). , – . . pettinati i, brem j, lee sy, mchugh pj & scho cj ( ) the chemical biology of human metallo- b -lactamase fold proteins. trends biochem. sci. , – . . cahill st, tarhonskaya h, rydzik am, flashman e, mcdonough ma, schofield cj & brem j ( ) use of ferrous iron by metallo-β-lactamases. j. inorg. biochem. , – . . newman ja, hewitt l, rodrigues c, solovyova a, harwood cr & lewis rj ( ) unusual, dual endo- and exonuclease activity in the degradosome explained by crystal structure analysis of rnase j . structure , – . . raj r, nadig s, patel t & gopal b ( ) structural and biochemical characteristics of two staphylococcus epidermidis rnase j paralogues rnase j and rnase j . j. biol. chem., jbc.ra . . . hamed rb, gomez-castellanos r, henry l, ducho c, mcdonough ma & schofield cj ( ) the enzymes of b-lactam biosynthesis. nat. prod. rep. , – . . antony s & bayse ca ( ) density functional theory study of the attack of ebselen on a zinc- finger model. inorg. chem. , – . . lee y, wang y, duh y, yuan hs & lim c ( ) identification of labile zn sites in drug- target proteins. j. am. chem. soc. , – . . may hc, yu jj, n. guentzel m, chambers jp, cap ap & arulanandam bp ( ) repurposing auranofin, ebselen, and px- as antimicrobial agents targeting the thioredoxin system. front. microbiol. , – . . noguchi n ( ) ebselen, a useful tool for understanding cellular redox biology and a promising drug candidate for use in human diseases. arch. biochem. biophys. , – . . roder c & thomson mj ( ) auranofin: repurposing an old drug for a golden new age. drugs r d , – . . jin z, du x, xu y, deng y, liu m, zhao y, zhang b, li x, zhang l, peng c, duan y, yu j, wang l, yang k, liu f, jiang r, yang x, you t, liu x, yang x, bai f, liu h, liu x, guddat lw, xu w, xiao g, qin c, shi z, jiang h, rao z & yang h ( ) structure of mpro from sars-cov- and discovery of its inhibitors. nature , – . . baddock ht, brolih s, yosaatmadja y, ratnaweera m, bielinski m, swift l, cruz-migoni a, .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / morris gm, schofield cj, gileadi o & mchugh pj ( ) characterisation of the sars-cov- exon (nsp <sup>exon</sup>-nsp ) complex: implications for its role in viral genome stability and inhibitor identification. biorxiv, . . . . . skinner md, lahmek p, pham h & aubin hj ( ) disulfiram efficacy in the treatment of alcohol dependence: a meta-analysis. plos one . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figures figure : overall architecture of artemis. a. a cartoon representation of the structure of human snm c/ artemis. the active site containing mbl domain is in pink; the β-casp domain (white) contains a novel zinc-finger like motif, that has not been identified in other mbl/ β-casp nucleic acid processing enzymes. the three zinc ions are represented by grey spheres. b. topology diagram of artemis protein. the β-strands are represented arrows and α-helices by cylinders. the mbl domain (pink) has the typical α/β-β/α sandwich of the mbl superfamily, with an insert of the β-casp domain (white) between the small helix α and helix α . c. overlay of structures of the human snm family members: snm a, snm b and snm c. d. cartoon representation of amino acid sequence alignment for human snm a, snm b and snm c, showing the conserved mbl and b- casp domains. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : active site views of human mbl/ β-casp nuclease family enzymes. each of these catalytic sites contains highly conserved motifs ( - , in red). motif = asp, motif = his and asp (hxhxdh), motif = his and motif = asp. a. the human snm a active site with a single octahedral zinc ion (grey) coordination (pdb: ahr). b. the active site of human snm b/ apollo (pdb: a f) with a nickel ion (green) and an iron ion (orange) with a coordinating amp molecule. c. human snm c/ artemis (pdb: af ) purified in the absence of imac with two zinc ions (in grey) in its active site. a water molecule shared (asterisk*) between the two metals is the proposed nucleophile for the hydrolytic reaction. d. the active site of the human rna processing enzyme cpsf (pdb: i t). the second zinc ion (m ) is coordinated by an additional histidine residue (his ) which has no counterpart in the snm proteins. e. the active site of human snm c/ artemis (pdb: tt ) purified with imac. a nickel ion is present in the first metal coordination site. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : comparison of a novel zinc-finger like motif in the β-casp domain of artemis with a canonical zinc- finger motif. a. cartoon representation of the classical cys his zinc-finger motif, from transcription factor sp f (pdb code: sp ). this has a ββα fold, where two cys- and two his-residues are involved in zinc ion coordination and the sidechains of three conserved hydrophobic residues are shown. b. the β-casp region of artemis has a novel zinc-finger like motif. the inset shows the four residues (two his and two cys) coordinating the zinc ion (grey). the fo ̶ fc electron- density map (scaled to . σ in pymol) surrounding the zinc ion before it was included in refinement. figure : overall structure representation of the artemis /snm c fold. a. overlay of our wt artemis structure (pdb code: af ) (pink) with that of karim et. al. pdb code: wo (aquamarine) (backbone rmsd . Å). b. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / re-analysis of the latter ( wo ) structure re-refined with a dna molecule present (pdb code; abs). the fo- fc map (contoured at . σ in pymol) is represented by grey mesh surrounding the dna (yellow). figure : electrostatic surface potentials of dna bound model for artemis/snm c (a) and apollo/snm b (b). the blue colour represents a more electropositive surface potential and the red show a more electronegative cluster. the active site contains the two metal ions represented in grey sphere for zinc, orange sphere for iron and green sphere for nickel ion. n- and c- terminal of the protein are indicated in red. the electrostatic surface potentials were generated using pymol (electrostatic range at +/- ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : proposed interactions of dna with artemis (pdb: abs). model for dna binding to artemis showing the residues contacting dna. the two zinc ions at the active site are represented by grey spheres. a. a row of positively charged residue is on the surface of the mbl domain interact with the phosphate backbone of the dna. b. a dna overhang is located at the active site. a cluster of polar residues (n , k and k ) is located in the β-casp domain. the extended flexible loop, which is unique to artemis compared to snm a and snm b, that connects b and b is indicated on the right. c. the dna overhang forms a hydrogen bond with arginine and interacts with a cluster of hydrophobic residues at the interface between mbl and β-casp domains. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : nuclease assay utilising truncated artemis (aa – ) with various dna substrates. a. the nuclease activity of artemis is indifferent to the ’ end group, indicative of true endonuclease activity. increasing concentrations of artemis from (ne; no enzyme) to nm incubated with nm ssdna with either a  phosphate,  hydroxyl, or  biotin moiety for min at °c. b. artemis is able to cleave dna substrates containing single-stranded regions. increasing amounts of artemis incubated with structurally diverse dna substrates ( nm) for min at oc. products for a and b were analysed by % denaturing page. the dna substrates utilised are represented at the top of the lanes and a red asterisk indicates the position of the  radiolabel. the positions of dna size markers run as a reference are indicated on the left, with sizes in nt. ne ne ne ne ne ne ne ne ssdna [snm c] (nm) dsdna ’ overhang ’ overhang splayed arms leading flap lagging flap replication fork – – – – – digestion products [snm c] (nm) – – – – – digestion products ne ne ne ssdna ne ne ne ’ overhang ne ne ne ’ pho ’ ’ oh ’ ’ bio ’ ’ pho ’ nt ’ oh ’ nt ’ bio ’ nt ’ pho ’ nt ’ oh ’ nt ’ bio ’ nt ’ overhang b a nt ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ nt ’ ’ ’ ’ nt nt ’ ’ ’ ’ ’ ’ ’ ’ nt nt ’ ’ ’ ’ .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : views from structure of artemis d a, h a and h d variants. a. an overlay of the four artemis structures: wt (pdb: af ) in pink, d a variant (pdb: afs) in yellow, h a variant (pdb: afu) in cyan and h d patient mutation (pdb: agi) in blue, showing the general architecture of the three variants are the same as the w structure. the nickel ion is represented as green sphere, and the zinc ions as grey spheres. the movement of helix αe in the β-casp domain is indicated by a red arrow. b. left: the active site of d a variant has a single nickel ion with two complexing water molecules (red spheres); right: the active site residues of wt artemis (pink), superimposed with those of the d a variant (yellow). aside from loss of the second metal in the d a mutant, there is little movement at the active site. c. active site residues of the h a variant (cyan; left) and an overlay (right) with wt artemis (pink). the two distinguishing features of h a variant are a lack of metal ions and movement of the loop containing his . d. the active site of the h d variant (blue; left) and an overlay (right) with wt artemis (pink). the h d point substitution is present in patients with omen syndrome. this variant lacks both metal ions, similarly to the h a variant. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : comparing the activity of artemis variants vs wt protein. increasing amounts (from to nm) of wt and mutant artemis proteins (as indicated) were incubated with nm of nucleotide ssdna substrate for min at °c. reaction products were subsequently analysed by % denaturing page. the size (in nucleotides) of the marker oligonucleotides are indicated on the left-hand side of the corresponding bands. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : artemis inhibition by β-lactam antibiotics. a. structures of selected β-lactam antibacterial. b. inhibitor profiles of β-lactams on the nuclease activity of artemis was assessed via a real-time fluorescence- based nuclease assay. c. cartoon representation of the structure of truncated artemis (aa - ) with a ceftriaxone molecule (in white) bound at the active site. d. active site residues with the electron density (fo-fc) contoured map around the modelled ceftriaxone. the map is contoured at the . σ level and was calculated before the ceftriaxone molecule was included in the refinement. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / tables table : data collections and refinement statistics. * data in parentheses is for the high-resolution shell. pdb id tt af ( zn) afs (d a) afu (h a) agi (h d) apv abs (ni and zn) (ceftriaxone) (dna bound) data collection and processing diffraction source dls (i ) dls (i ) dls (i ) dls (i ) dls (i ) dls (i ) aps - idd wavelength (Å) . . . . . . . space group p p p p p p p cell dimensions a, b, c (Å) . , . , . . , . , . . , . , . . , . , . . , . , . . , . , . . , . , . α, β, γ (°) . , . , . . , . , . . , . , . . , . , . . , . , . . , . , . . , . , . resolution (Å) * . - . . - . . - . . - . . - . . - . – . ( . - . ) ( . - . ) ( . - . ) ( . - . ) ( . - . ) ( . - . ) ( . – . ) rmerge (%)* . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) i/ σ(i) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) ( . ) completeness (%) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) multiplicity . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) refinements resolution (Å) . - . . - . . - . . - . . - . . - . - . no. of reflections rwork . . . . . . . rfree . . . . . . . no. of atom protein water zinc/ nickle ethylene glycol - dna - - - - - - ceftriaxone - ̶ - - - - b-factors . . . . . . r.m.s. deviations bond length (Å) . . . . . . . bond angles (°) . . . . . . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / sars-cov- rbd in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor sars-cov- rbd in vitro evolution follows contagious mutation spread, yet generates an able infection inhibitor jiří zahradník , shir marciano , maya shemesh , eyal zoler , jeanne chiaravalli , björn meyer orly dym , nadav elad and gideon schreiber , department of biomolecular sciences, weizmann institute of science, rehovot , israel chemogenomic and biological screening core facility institut pasteur, paris, france viral populations and pathogenesis unit cnrs umr institut pasteur, paris, france department of life sciences core facilities, weizmann institute of science, rehovot , israel department of chemical research support, weizmann institute of science, rehovot , israel corresponding author: gideon.schreiber@weizmann.ac.il short title: rbd in vitro evolution abstract sars-cov- is constantly evolving, with more contagious mutations spreading rapidly. using in vitro evolution to affinity maturate the receptor-binding domain (rbd) of the spike protein towards ace , resulted in the more contagious mutations, s n, e k, and n y to be among the first selected. this includes the british and south-african variants. plotting the binding affinity to ace of all rbd mutations against their incidence in the population shows a strong correlation between the two. further in vitro evolution enhancing binding by -fold provides guidelines towards potentially new evolving mutations with even higher infectivity. for example, q r in combination with n y. this said, the high-affinity rbd is also an efficient drug, inhibiting sars-cov- infection. the . Å cryo-em structure of the high- affinity complex, including all rapidly spreading mutations provides structural basis for future drug and vaccine development and for in silico evaluation of known antibodies. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . sars-cov- , which causes covid- , resulted in an epidemic of global reach. it infects people through inhalation of viral particles, airborne, in droplets, or by touching infected surfaces. structural and functional studies have shown that a single receptor-binding domain (rbd) of the sars-cov- homotrimer spike glycoprotein interacts with ace , which serves as its receptor ( , ). its binding and subsequent cleavage by the host protease tmprss results in the fusion between cell and viral membranes and cell entry ( ). blocking the ace receptors by specific antibodies voids viral entry ( , , ). in vitro binding measurements have shown that sars-cov- s-protein binds ace with ~ nm affinity, which is about -fold tighter compared to the binding of the sars-cov s-protein ( , , ). it has been suggested that the higher affinity of sars-cov- is, at least partially, responsible for its higher infectivity ( ). recently evolved sars-cov- mutations in the spike protein´s rbd have further strengthened this hypothesis. the “british” mutation (n y; variant b. . . ) was suggested from deep sequencing mutation analysis to enhance binding to ace ( ). the “south african” variant ( .v ), which includes three altered residues in the ace binding site (k n, e k, and n y) is spreading extremely rapidly, becoming the dominant lineage in the eastern cape and western cape provinces ( ). another variant that seems to enhance sars-cov- infectivity is s n, which became dominant in many regions ( ). ace and tmprss express in lung, trachea, and nasal tissue ( , ). the inhaled virus likely binds to epithelial cells in the nasal cavity and starts replicating. the virus propagates and migrates down the respiratory tract along the conducting airways, and a more robust innate immune response is triggered, which in some cases leads to severe disease. recently, a number of efficient vaccines, based on presenting the spike protein or by administrating an inactivated virus were approved for clinical use ( ). still, due to less than % protection, particularly for high-risk populations and the continuously mutating virus, the development of drugs should continue. potential therapeutic targets blocking the viral entry in cells include molecules blocking the spike protein, the tmprss protease, or the ace receptor ( ). most prominently, multiple high-affinity neutralizing antibodies have been developed ( ). alternatives to the antibodies, the soluble forms of the ace protein ( ) or engineered parts or mimics have also been shown to work ( , ). tmprss , inhibitors were already previously developed, and are repurposed for covid- ( ). the development of molecules blocking the ace protein did not receive as much attention as the other targets. one potential civet with this approach is the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . importance of the ace activity in humans, which could be hampered by an inhibitor. ace functions as a carboxypeptidase, removing a single c- terminal amino acid from ang ii to generate ang-( - ), which is important in blood pressure regulation. in addition, ace is fused to a collectrin-like domain, regulating amino acid transport and pancreatic insulin secretion ( , ). through these processes, ace also appears to regulate inflammation, which downregulation relates to increased covid- severity. dalbavancin is one drug that has been shown to block the spike protein–ace interaction, however with low affinity (~ nm) ( ). notably, the rbd domain itself can be used as a competitive inhibitor of the ace receptor binding site. however, for this to work, its affinity has to be significantly optimized, to reach pm affinity. we have recently developed an enhanced strategy for yeast display, based on c and n- terminal fusions of extremely bright fluorescence colors that can monitor expression at minute levels, allowing for selection to proceed down to pm bait concentrations ( ). here, we demonstrate, how this enhanced method allowed us to reach pm affinity between a mutant rbd and ace , based on multiple-steps of selection that combine enhanced binding with increase rbd protein thermostability. fig. shows step by step the selection process. we took advantage of using two different detection strategies, eunag and dnbalfa, and eliminate the dna purification step, which can be tedious ( , ) (fig. , steps and ). preceding library construction, we tested varied sizes of the rbd, for optimal surface expression (table s ), and decided to continue using rbdcon for selection and rbdcon for protein expression (supplementary material text). rbd domain affinity maturation recapitulates multiple steps in the virus evolution fig. enhanced yeast display benefits over traditional method. the use of enhanced yeast display enables elimination of dna purification procedures between libraries (step ii.); exclusion of antibody-based expression labeling procedure (step vi.), and the bright reporters eunag (orange points, step .) or dnbalfa (green points, step .) allow for ultra-tight binding selection, with reduced background and increased sensitivity in a reduced time frame ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. in vitro evolution of spike protein rbd using yeast display and the emergence of mutations in sars-cov- over time. (a) an overview of mutations identified during the yeast display affinity maturation process. the red and grey colored amino-acids are dominant (˃ %) or minor (< %) at a given position. red and orange background highlight the emerging mutations both in clinical samples and yeast display, with a high and low impact to binding affinity, respectively. the bottom of the table shows the naturally evolved mutations at the same positions. (b) the relation between inferred affinity changes and occurrence. red for prevalent mutations, black for others and empty squares for occurrence < sequences ( , ). blue dots are values from binding titration curves shown in (c), which were selected also by yeast display affinity maturation. (d) affinity changes and occurrence in population (as in (b) for different mutations at positions , and . (e) binding titration curves for the best binding variant in each successive yeast library, bound to ace -cf r at the given concentration. binding of additional clones from each library is shown in fig. s . (f, g) octet red system binding sensorgrams for rbd-wt (f) and rbd- (g). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . multiple consecutive libraries were constructed, s - stability-enhancing, b - ace binding, and fa for the fast association. the whole rbd library (s , nucleotides - – ) was constructed by random mutagenesis, introducing - mutations per clone. the best expressing clones were selected after expression at °c. subsequently, library s was selected after expression at °c. the most significant mutation, which dominated the second library was i f, which nicely fits inside the hydrophobic pocket formed in the rbd domain (fig. s ). this mutation led to nearly doubling the fluorescence signal intensity and was used for the construction of the subsequent affinity selection library (b ). b library was constructed by components homologous recombination to preferentially incorporate the mutations in the binding interface area. the random mutagenesis was limited to nucleotides – . the library was expressed at °c to keep the pressure on protein stability, and selected by facs sorter against decreasing concentration of ace labeled with cf® r succinimidyl ester ( , , and pm; h of incubation). to isolate a low number of rbd variants with the strongest phenotype effect, library enrichment was done by selecting the top % of binding cells and in subsequent rounds, the top . – % yeast cells (fig. s ). plasmid dna was isolated from growing selected yeasts of the sorted lib rary and used for e.coli cell transformation and the preparation of a new library (b ). this approach enriched the subsequent library with multiple selected mutations and enabled the screening of wider sequence- space and cooperative mutations, as multiple trajectories are sampled. single colony isolates (sci) of transformed bacteria were used for sequencing to monitor the enrichment process and subsequently for binding affinity screening (fig. s ). the library b was selected with the same schema using , , and pm ace receptor. analysis of the selected b library yielded two dominant mutations appearing at ˃ % of clones: e k and n y. in addition, multiple minor mutations: v e, n y, i t, s n, n s, and f s were found (fig. a). the analysis of library b showed the absolute domination of e k and n y. besides the dominant clones, the n k, q r, and s n mutations rose to frequencies ˃ %, and new minor populated mutations were identified: g r, i v, t s, f y, and s p. to validate our results, we choose clones with different mutation profiles, expressed them in expi f™ cells, and subjected them for further analyses (figs. a, c, e, s , table and table s ). we noted, that among the mutations selected and fixed in the yeast population during these initial steps of affinity maturation were three mutations that strongly emerged in clinical samples of sars-cov- : s n, e k, and n y ( , , ). it was already shown that an increase in rbd binding affinity increases pseudovirus entry ( ). to validate the relation (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . between binding affinity and occurrence of specific mutations in the population, we combined our data with those obtained by deep-mutational scanning of the rbd domain ( ), and the gisaid database ( ). fig. a shows the mutations selected by us, and the evolving, circulating sars-cov- variants at the same positions. in red are the most common mutations emerging in either (s n, e k, and n y). in addition, the yeast selection probed the most abundant naturally occurring variants in positions and , which were lost during further rounds of yeast selection. s p, which also occurs in nature but did not rapidly spread (fig. b), was selected in round . further analysis of this clone (table , compare rbd- to rbd- ) shows it to increase the thermostability but decrease the association rate constant of the rbd to ace . finally, some mutations were found in sars-cov- (albeit at low frequency, fig. a and b) and not in the yeast selection ( , , , and ). to evaluate why some mutations were prevalent in sars-cov- and in yeast display selection, while others were not, we plotted the occurrence of all mutations in the gisaid database ( ) in respect to the apparent change in the rbd-ace binding affinity (kdapp) as estimated by the frequency of given amino acids within a mutant library at the given concentration (so-called deep mutational scanning approach ( )). figure b (red and black dots) shows that the more prevalent mutations have a higher binding affinity. to quantify these results, we measured the binding of re-cloned isogenic variants of the most prevalent mutations (fig. c). the here calculated kd values are shown as blue dots in fig. b. the highest binding affinity was measured for the south-african variant (e k, n y), which is the tightest binding clone of library b (fig. c and table s ), followed by the “british” (n y) and the european emerging s n mutations (figs. b, c and table ). the kd of the south-african variant is pm, the british pm and for s n a kd of pm was measured (compared to . nm for the wt). the here measured affinity data show an even stronger relation between binding affinity and spread in the population. to further test the lack of randomness in the selection of these mutations, we compared the occurrence of mutations for these three residues to other amino-acids in the population with the apparent binding affinity. fig. d shows that indeed, in all cases (except e r) the binding affinity of the most abundant variant in the population has the highest binding affinity at the given position. in respect to e r, the mutation of glu to arg requires two nucleotide changes in the same codon, making this mutation reachable only by multiple rounds of random mutagenesis, which will delay its occurrence and may explain its low frequency (however, will not stop its spread). next, we monitored whether the spread of mutation in the population also relates to the protein-stability of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the rbd. here, we used the level of yeast surface expression as a proxy to estimate protein stability ( ). fig. s shows, that mutations which occurrence is increasing in the population did not affect protein stability, corroborating that maintaining protein-stability is an important evolutionary constraint. the most abundant naturally occurring mutations in rbd have been selected by yeast display, already in the first affinity maturation library (b ) ( , , ). next, we aimed to explore whether much higher affinity binding can be achieved. exploring the affinity limits for ace -rbd interaction a further selection of better binders can demonstrate the future path of sars-cov- evolution. in parallel, an ultra-tight binder can be used as an effective ace blocker for inhibiting sars- cov- infection. we used the same approach as for b and b and created the subsequent library b . the library b was enriched by using pm ace as bait, followed by pm, and finally at pm. sorting with less than pm bait was done after overnight incubation in ml solution to prevent ligand depletion effect (as the number of ace molecules becomes much lower than the number of rbd molecules). round resulted in the fixation of mutations n k, e k, q r, and n y in all sequenced clones. mutations s n and s p were present with frequencies ˃ %. additional mutations identified were g r, i v, and f y. representative clones with different mutational profiles were subjected to detailed analyses (table ). in the next selection step, we targeted for faster association-rates by using pre- equilibrium selection ( ). the new library fa (fast association) was created by randomization of the whole rbd gene population from the enriched b library. the library was pre-selected with pm ace for hrs (reaching equilibrium after on incubation) followed by hr and min incubation before selection. this resulted in the accumulation of additional mutations: v k, i t, t m and also the fixation of the previously observed mutation s n in all sequences cloned. minor mutations n e, k f, v w, and s p, with only a single sequence each, were identified. one should note that v k and t m require two nucleotide mutations to be reached, demonstrating the efficiency of using multiple rounds of library creation on top of previous libraries (and not single clones). interestingly, these mutations were not located at the binding interface but rather in the peripheries, which is in line with previously described computational fast association design, where periphery mutations were central ( ). from the fa library, we determined the isogenic binding for different clones with clone rbd- being the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. cryo-em structure of the ace -rbd- complex at . Å resolutions. a) the cryo-em electron density map with ace (cyan), rbd- rbm (magenta), and rbd core (pink). b) cartoon representation of the ace -rbd- model with eight mutations resolved in the electron density map (orange). c) the s n, q r and n y mutations depicted in rbm (orange spheres) interacting with s , q and k of ace respectively (cyan spheres) are situated at the two extremes of the rbd-ace interface, suggested to stabilizing the complex. d) the interaction network formed between rbd- mutations and ace . rbd-wt residues are in white (heteroatom coloring schema). e) electrostatic complementarity between rbd and ace is strengthened in rbd- by positive charges at positions n k, e k, and q r. the black line one ace indicates the rbd binding site. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . best. yeast display titration showed an affinity of . ± . pm (fig. e and table ). the other clones tested from the fa library had affinities between to pm (fig. s and table s ). ace receptor and clones rbd- , rbd- , and rbd- were expressed and purified (fig. s ). measuring the binding affinity to ace using the octet red system showed a systematically lower binding affinity in comparison to yeast titration (table ). for wt, yeast titration was reduced from . to nm and for rbd- the affinity was reduced from . to pm. however, the improvement in affinity is similar for both methods (~ -fold). while most of the improvement came from reduced koff (fig. f and g) kon increased -fold, from . x to x m- s- for rbd- (table ). in addition, rbd- is c more stable than wt, probably due to the introduction of the i f stabilizing mutation (fig. s ). to further increase the rbd- affinity we prepared a site-directed mutational library on top of rbd- , including the mutations suggested from deep mutational scanning ( ), which require more than one nucleotide table – biophysical parameters of the mutant clones selected by yeast display. for more details see table s . clone library plasmida mutations tmb [°c] yeast displayc kd,app (pm) octet redd kd, (pm) kone m- s- x rbd-wt - pjydc aa – . ± ± . ± . rbd- b pjydc i f, s n, n y nd ± nd rbd- b pjydc i f, e k, n y nd ± . nd rbd- b pjydc i f, i t, n y, n y . . ± . nd rbd- b pjydc i f, s n, q r, n y . ± . nd rbd- b pjydc i f, n k, e k, s p, q r, n y . ± . ± . ± . rbd- b pjydc i f, n k, e k, q r, n y . . ± . ± . ± . rbd- b pjydc i f, e k, q r, n y . . ± . nd rbd- fa pjydc i f, v k, n k, i t, t m, s n, e k, q r, n y . . ± . ± ± rbd- - pjydc i f, v w, r d, k v, v k, n k, i t, t m, s n, e k, q r, n y . ± . ± ± a pjydc plasmid is using intrinsic eunag reporter; pjydc plasmid contains dnbalfa reporter; see ( ) b melting temperature as measured by differential scanning fluorimeter tycho nt. (nanotemper technologies gmbh) c kd values measured between yeast surface-exposed rbd variants and the monomeric extracellular portion of ace receptor. d,e measured by octet red system (fortebio) by using ar g biosensors. for details see materials and methods. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . change to be reached (fig. s ). surprisingly, these mutations did not significantly increase the affinity towards rbd- as they did for wild-type ( ). yet, a combination of three of them stabilized rbd- by c, creating rbd- , but at the cost decreased binding affinity (table , fig. s ). this demonstrates the limitation of the use of single amino-acid changes from deep mutational scanning to obtain high-affinity binders. rbd- -ace structure we determined the cryo-em structure of the n-terminal peptidase domain of the ace (g - y ) receptor bound to the rbd- (t -k ) (fig. a), including nine mutations (i f, v k, n k, i t, t m, s n, e k, q r, n y; table , figs. a and b). structure comparison of the ace -rbd- complex and the wt complex (pdb id: m j) revealed their overall similarity with rmsd of . Å across amino acids of the ace and . Å among amino acids of the rbd (fig. s a). three segments; r -s (β , α ), g - v (α ), and f -h (β ) are disordered in rbd- , and thus not visible in the electron density map (fig. s and blue cartoon in fig. s b). these segments are situated opposite to ace binding interface and therefore not stabilized and rigidified by ace contacts. all mutations, except i f, are present in the electron density map. details of cryo-sample preparation, data acquisition, and structural determination are given in the supplementary materials methods. the cryo-em data collection and refinement statistics are summarized in fig. s , s , s , and table s . mutations v k, n k, i t, t m, s n, e k, q r, and n y are part of the receptor-binding motif (rbm) that interacts directly with ace (orange spheres fig. b and c) ( ). the rbm including residues s -q shows the most pronounced conformational differences in comparison to the rbd-wt (the black circle in fig. s a). out of the nine mutations in the rbm four involve intramolecular interactions, stabilizing the rbd- structure, including hydrogen contacts between k and d , t and r , and m and y . the mutations s n, q r, n y are forming new contacts with ace . the arg at position makes a salt bridge to q and hydrogen contact to y of ace making together with mutation n y (y has contact with k ) a strong network of new interactions supporting the impact of these two residues (fig. d). calculating the electrostatic potential of the rbd- in comparison to rbd-wt shows a much more positive surface of the former, which is complementary to the negatively charged rbd binding surface on ace (fig. e). in addition, the mutation n (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . interacts with s of ace (fig. c). interestingly the interface of ace -rbd involves the interaction of amino acid residues from the n-terminal segment q -q , k , and d of the ace domain and residues from the rbm domain of the rbd. the s n, q r, and n y mutations in rbd- are situated at the two extremes of the rbd-ace interface therefore stabilizing the complex (fig. c). rbd- inhibits sars-cov- infection without affecting ace enzymatic activity the main driver of this study was to generate a tight inhibitor of ace for medicinal purposes, which will be administered to the nose and lungs through inhalation. therefore, we had to verify that the evolved rbd does not interfere with the ace enzymatic activity, which is important in the renin-angiotensin-aldosterone system ( , ). we assayed the impact of rbd-wt and rbd- proteins on ace activity. both the in vitro assay and assays done on various cells expressing ace did not show much difference in ace activity with and without rbd-wt or rbd- added (figs. a, s ). finally, we explored the inhibition of rbd-wt and rbd- on viral entry. initially, we used lentivirus pseudotyped with spike protein variant sΔc ( ). this spike variant lacked the last amino acids that are responsible for its retention in the endoplasmic reticulum. the relative cellular entry was analyzed by flow-cytometry of lentivirus infection promoting gfp fig. inhibition of rbd-wt and rbd- on ace activity and their potential to inhibit viral entry and infection. (a) ace activity (in vitro or on cells) assayed using sensolyte® ace activity assay kit. fluorogenic peptide cleavage by ace was measured in seconds intervals over minutes. the activity rate is indicated by the slope of the plot [product/time]. an ace inhibitor (inh.), provided with the kit, was used as the negative control. (a). ace activity was measured in vitro after the addition of nm of rbd-wt or rbd- to purified ace . (upper panel). ace activity was measured following incubation with rbd-wt or rbd- on hela cells transiently transfected with human full-length ace (bottom panel). (b) inhibition of infection of hek- t cells stably expressing ace by lentivirus pseudotyped with sars-cov- spike protein. (c) inhibition of sars-cov- infection by rbd-wt and rbd- proteins. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . signal. the hek- t cells stably expressing hace were pre-incubated with serial dilutions of the two rbds for h and then the pseudovirus was added for hrs. results in fig. b show that the ec was reduced from nm for rbd-wt to . nm for rbd- . next, rbd-wt and rbd- were evaluated for their potency in inhibiting sars-cov- infection to veroe cells (fig. c). similar to the pseudovirus, also here the ec was reduced from to . nm for rbd-wt and rbd- respectively. more significantly, rbd- blocked > % of viral entry and replication, while rbd-wt blocked only ~ % of viral replication. the complete blockage of viral replication, using a low nm concentration of rbd- makes it a promising drug candidate. discussion the sars-cov- pandemic is an ongoing event, with the virus constantly acquiring new mutations. intriguingly, the naturally selected mutations s n, e k, and n y of the spike protein rbd, which show higher infectivity, were selected by yeast surface display affinity maturation already in the first round, giving rise to the south-african, e k, n y, and british variants that bind ace and . -fold tighter than rbd-wt. following three additional rounds of yeast display selection resulted in -fold tighter binding in comparison to rbd-wt. the selection process took advantage of combinatorial selection, without compromising protein- stability. the high-affinity binder, rbd- was evaluated as a potential drug and showed to efficiently block ace , without affecting its important enzymatic activity. while natural virus selection is not as efficient as in vitro selection, the gained information on the more critical mutations can be used as a tool to identify emerging mutations. we hypothesize that e r will continue to spread and will become more dominant, especially in combination with n y. in contrast, we do not expect the rapid spread of s p. importantly, the mutation q r appeared in the library b after the incorporation of tyr at position . this combination dramatically increased the affinity below pm as is shown by the difference between rbd- and rbd- (table ). notably, the wild-type rbd codon at position is caa, allowing for direct change to arginine codon cga. r was not sampled yet by the virus (fig. a) but its appearance should be carefully monitored. moreover, r is located in a hypervariable location of the rbd (fig. s ), which makes its appearance more plausible. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we successfully solved the cryo-em structure of rbd- to high resolution. the structure shows that rbd- has much improved electrostatic complementarity with ace , in relation to rbd-wt (figure ). this can be attributed to the use of the fast association protocol. the structure contains many of the currently evolving mutations (s n, e k, n y) and can serve not only as valuable source of information but also as a “crystal ball” to predict future virus evolution steps. to evaluate the effect of mutations in the rbd on antibody binding, we manually inspected antibody-rbd (nanobody, spike) structures for clashes. of the antibodies bind outside the rbm and interactions are similar with rbd- and rbd-wt. however, for antibodies, a decrease in the number of contacts was observed and in cases major clashes with rbd- are observed (fig. s ). notably, e r and q r caused most of the observed effects. these findings suggest the need for close monitoring of the efficiency of drugs and vaccines for current and future mutations. an intriguing question is whether the spreading of the tighter binding sars-cov- variants in humans is accidental. from the similarity to yeast display selection, where stringent conditions are used, one may hypothesize that stringent selection is also driving the rapid spread of these mutations. face masks of low quality (which are by far the most abundant) would provide such selection conditions, as they reduce exhaled viral titers, given tighter binding variants an advantage over wt to spread rapidly in the population (as a result of r of mutated viruses being > , while < for wt viruses). this should be urgently investigated, as one may consider the mandatory use of higher quality face-masks, which will reduce viral titer to bellow infection levels (as indeed seen with medical personal who use such masks) and stop spreading these tighter binding virus mutations. acknowledgments funding: this research was supported by the israel science foundation (grants no. / and / ) within the killcorona – curbing coronavirus research program and by the ben b. and joyce e. eisenberg foundation. authors contribution: j.z. and g.s. conceived the project; j.z., s.m., m.s., e.z., j.c., b.m and g.s. performed experiments; n.e. prepared cryo-em samples and built atomic models and refined structures with o.d. j.z, n.e, o.d and g.s wrote the manuscript. competing interests: the authors j.z. and g.s. declare the us provisional patent application no. / , (yeda ref.: - ). data and materials availability: maps and atomic coordinates have been deposited in the protein data bank (www.rcsb.org) and the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . electron microscopy data bank (www.ebi.ac.uk/pdbe/emdb with accession codes: xxx, xxx, respectively. supplementary materials materials and methods supplementary text table s – s figs. s – s references ( – ) references . m. hoffmann et al., sars-cov- cell entry depends on ace and tmprss and is blocked by a clinically proven protease inhibitor. cell , - .e ( ). . j. lan et al., structure of the sars-cov- spike receptor-binding domain bound to the ace receptor. nature , - ( ). . d. wrapp et al., cryo-em structure of the -ncov spike in the prefusion conformation. science , - ( ). . w. tai et al., characterization of the receptor-binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine. cellular & molecular immunology , - ( ). . a. c. walls et al., structure, function, and antigenicity of the sars-cov- spike glycoprotein. cell , - .e ( ). . t. n. starr et al., deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding. cell , - .e ( ). . h. tegally et al., emergence and rapid spread of a new severe acute respiratory syndrome-related coronavirus (sars-cov- ) lineage with multiple spike mutations in south africa. medrxiv, . . . ( ). . j. chen, r. wang, m. wang, g.-w. wei, mutations strengthened sars-cov- infectivity. journal of molecular biology , - ( ). . s. lukassen et al., sars-cov- receptor ace and tmprss are primarily expressed in bronchial transient secretory cells. embo j , e -e ( ). . c. g. k. ziegler et al., sars-cov- receptor ace is an interferon-stimulated gene in human airway epithelial cells and is detected in specific cell subsets across tissues. cell , - .e ( ). . l. dai, g. f. gao, viral targets for vaccines against covid- . nature reviews. immunology, - ( ). . s. h. nile et al., covid- : pathogenesis, cytokine storm and therapeutic potential of interferons. cytokine & growth factor reviews , - ( ). . c. o. barnes et al., sars-cov- neutralizing antibody structures inform therapeutic strategies. nature , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . t. m. abd el-aziz, a. al-sabi, j. d. stockand, human recombinant soluble ace (hrsace ) shows promise for treating severe covid- . signal transduction and targeted therapy , ( ). . d. schütz et al., peptide and peptide-based inhibitors of sars-cov- entry. adv drug deliv rev , - ( ). . l. cao et al., de novo design of picomolar sars-cov- miniprotein inhibitors. science , - ( ). . d. d. f. lelis, d. f. d. freitas, a. s. machado, t. s. crespo, s. h. s. santos, angiotensin- ( - ), adipokines and inflammation. metabolism , - ( ). . h. zhang, j. m. penninger, y. li, n. zhong, a. s. slutsky, angiotensin-converting enzyme (ace ) as a sars-cov- receptor: molecular mechanisms and potential therapeutic target. intensive care medicine , - ( ). . g. wang et al., dalbavancin binds ace to block its interaction with sars-cov- spike protein and is effective in inhibiting sars-cov- infection in animal models. cell research, ( ). . j. zahradník, d. dey, s. marciano, g. schreiber, an enhanced yeast display platform demonstrates the binding plasticity under various selection pressures. biorxiv, . . . ( ). . g. chao et al., isolating and engineering human antibodies using yeast surface display. nature protocols , - ( ). . s. elbe, g. buckland-merrett, data, disease and diplomacy: gisaid's innovative contribution to global health. global challenges , - ( ). . s. kemp et al., recurrent emergence and transmission of a sars-cov- spike deletion Δh /Δv . biorxiv, . . . ( ). . r. cohen-khait, g. schreiber, selecting for fast protein–protein association as demonstrated on a random tem yeast library binding blip. biochemistry , - ( ). . t. selzer, s. albeck, g. schreiber, rational design of faster associating and tighter binding protein complexes. nature structural biology , - ( ). . h. cohen-dvashi et al., coronacept – a potent immunoadhesin against sars-cov- . biorxiv, . . . ( ). . l. benatuil, j. m. perez, j. belk, c.-m. hsieh, an improved yeast transformation method for the generation of very large human antibody libraries. protein engineering, design and selection , - ( ). . a. r. aricescu, w. lu, e. y. jones, a time- and cost-efficient system for high-level protein production in mammalian cells. acta crystallographica. section d, biological crystallography , - ( ). . y. peleg, t. unger, application of the restriction-free (rf) cloning for multicomponents assembly. methods in molecular biology (clifton, n.j.) , - ( ). . d. s. wilson, a. d. keefe, random mutagenesis by pcr. current protocols in molecular biology chapter , unit . ( ). . r. d. gietz, yeast transformation by the liac/ss carrier dna/peg method. methods in molecular biology (clifton, n.j.) , - ( ). . d. n. mastronarde, automated electron microscope tomography using robust prediction of specimen movements. journal of structural biology , - ( ). . a. punjani, j. l. rubinstein, d. j. fleet, m. a. brubaker, cryosparc: algorithms for rapid unsupervised cryo-em structure determination. nature methods , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . a. punjani, h. zhang, d. j. fleet, non-uniform refinement: adaptive regularization improves single-particle cryo-em reconstruction. nature methods , - ( ). . a. punjani, d. j. fleet, d variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-em. biorxiv, . . . ( ). . p. d. adams et al., phenix: a comprehensive python-based system for macromolecular structure solution. acta crystallographica. section d, biological crystallography , - ( ). . b. p. klaholz, deriving and refining atomic models in crystallography and cryo-em: the latest phenix tools to facilitate structure analysis. acta crystallographica. section d, structural biology , - ( ). . p. emsley, k. cowtan, coot: model-building tools for molecular graphics. acta crystallographica. section d, biological crystallography , - ( ). . v. b. chen et al., molprobity: all-atom structure validation for macromolecular crystallography. acta crystallographica. section d, biological crystallography , - ( ). . e. f. pettersen et al., ucsf chimera—a visualization system for exploratory research and analysis. journal of computational chemistry , - ( ). . f. amanat et al., a serological assay to detect sars-cov- seroconversion in humans. nature medicine , - ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary materials materials and methods cloning and dna manipulations the rbd domain variants (see table s ) were pcr amplified (kapa hifi hotstart readymix, roche, switzerland) from codon-optimized sars-cov- spike protein gene (sino biological, sars-cov- ( -ncov) cat: vg -ut, genbank: qhd . ) by using appropriate primers. amplicons were purified by using nucleospin® gel and pcr clean-up kit (nacherey- nagel, germany) and eluted in ddw. yeast surface display plasmid pjydc (adgene id: ) and pjydc ( ) were cleaved by ndei and bamhi (neb, usa) restriction enzymes, purified, and tested for non-cleaved plasmids via transformation to e.coli cloni® g cells (lucigen, usa). each amplicon was mixed with cleaved plasmid in the ratio: µg insert: µg plasmid per construct, electroporated in s.cerevisiae eby ( ), and selected by growth on sd-w plates. cloning of ace extracellular domain (aa g -y ) gene and rbds into vectors phl-sec ( ) were done in two steps. initially, the rbd gene was inserted in helper vector pca by restriction-free cloning ( ). pca is a phl-sec derivative lacking bp in the gc rich region (nt - ). in the second step, the correctly inserted, verified by sequencing, rbds with flanking sequences were cleaved by using restriction enzymes xbai and xhoi (neb, usa) and ligated (t dna ligase, neb, usa) in cleaved full-length plasmid phl-sec. site-directed mutagenesis of rbds was performed by restriction-free cloning procedure ( ). megaprimers were amplified by kapa hifi hotstart readymix (roche, switzerland), purified with nucleospin™ gel and pcr clean-up kit (nachery-nagel, germany), and subsequently inserted by pcr in the destination using high fidelity phusion® (neb, usa) or kapa polymerases. the parental plasmid molecules were inactivated by dpni treatment ( h, neb, usa) and the crude reaction mixture was transformed to electrocompetent e. coli cloni® g cells (lucigen, usa). the clones were screened by colony pcr and their correctness was verified by sequencing. dna libraries preparation sars-cov- rbd gene (rbd) libraries were prepared by mncl error-prone mutagenesis ( ) using taq ready-mix (hylabs, israel). the mutagenic pcr reactions ( µl) were supplemented (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . with increasing mncl concentrations: . , . , . , . , . , . and . nm. template dna concentration ranged between and ng per reaction and – reaction cycles were applied. the amplified dna was purified, pooled, and used directly for yeast transformation via electroporation. the whole gene randomization amplicon comprised rbd and linker between it and aga p protein (nucleotides - – , pjydc vector). libraries b , b , and b were prepared by homologous recombination of an invariant fragment of rbd with necessary overlaps ( – ) and the mutagenized library fragment ( – ). the mutagenic fragments were prepared by the same error-prone pcr procedure ( cycles). yeast transformation, cultivation, and expression procedures the detailed description of all the procedures and our enhanced yeast display platform itself was described in details ( ). briefly, plasmids were transformed into the eby saccharomyces cerevisiae ( , ). single colonies were inoculated into . ml liquid sd-caa media ( ), and grown overnight at °c ( rpm). the overnight cultures were spun down ( g, min) and the exhausted culture media was removed before dilution in the expression media / ( ) to od ~ . the expression cultures were grown at different temperatures , , and °c for – h at rpm, depending on the experimental setup. the expression co-cultivation labeling was achieved by the addition of nm dmso solubilized bilirubin (pjydc , eunag reporter holo- form formation, green/yellow fluorescence (ex. nm, em. nm)) or nm alfa-tagged mneongreen (pjydc , dnbalfa). aliquots of cells ( ul) were collected by centrifugation ( g, min) resuspended in ice-cold pbsb buffer (pbs with g/l bsa), passed through cell strainer nylon membrane ( µm, spl life sciences, korea), and analyzed. binding assays and affinity determination using yeast surface display aliquots of yeast expressed and labeled cells ready for flow-cytometry analysis were resuspended in analysis solution with a series of labeled ace concentrations. the concentration range was of cf® r succinimidyl ester labeled (biotium, usa) ace extracellular domain (aa q – s ) was dependent on the protein analyzed ( . pm – nm). the analysis solution volume was adjusted ( – ml) to avoid the ligand depletion greater than % as well as the time needed to reach the equilibrium ( h – h, rpm, °c) ( ). after the incubation, samples were collected ( g, min), resuspended in ul of ice-cold pbsb buffer ( µl), passed through a cell strainer, and analyzed. the expression and binding signals were determined by (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . flow cytometry using bd accuri™ c flow cytometer (bd biosciences, usa). the cell analysis and sorting were done by s e cell sorter (biorad, usa). the analysis was done by single-cell event gating (fig. s ), green fluorescence channel (fl -a) was used to detect rbd expression positive cells (rbd+) via eunag or dnbalfa, and far-red fluorescent channel (fl -a) recorded cf® r labeled ace binding signals (cf +). the eunag signals were automatically compensated by the prosort™ software and pjydnp positive control plasmid (adgene id ( )). the mean fl -a fluorescence signal values of rbd+ cells, subtracted by rbd-, were used for determination of binding constant kd. the standard non-cooperative hill equation was fitted by nonlinear least-squares regression using python . . the total concentration of yeast exposed protein was fitted together with two additional parameters describing the given titration curve ( ). production and purification of rbd and ace proteins the extracellular part of ace (q – s ) and rbd protein variants (table s ) were produced in expi f cells (thermofisher). pure dna was transfected using expifectamine transfection kit (thermofisher) using the manufacturer protocol. hours post-transfection, the cells were centrifuged at rpm for minutes. the supernatant was filtered using . µm nalgene, thermofisher filter and the pellet was discarded. the filtered supernatant was loaded onto a ml of histrap fast flow column (cytivia (ge, usa), cat - - ). Äkta pure (cytivia, usa) was used to purify the protein. the column was washed in mm tris, mm nacl mm imidazole, then, the protein was eluted using gradient elution with elution buffer containing mm tris, mm nacl m imidazole. buffer exchange to pbs and the concentration of the protein were done by using amicons® (merck millipore ltd, cat:ufc ). cryo-electron microscopy sample preparation: . µl of ace -rbd- complex at . mg/ml concentration was transferred to glow discharged ultraufoil r . / . mesh grids (quantifoil), blotted for . seconds at °c, % humidity, and plunge frozen in liquid ethane cooled by liquid nitrogen using a vitrobot plunger (thermo fisher scientific). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cryo-em image acquisition: cryo-em data were collected on a titan krios g i transmission electron microscope (thermo fisher scientific) operated at kv. movies were recorded on a k direct detector (gatan) installed behind a bioquantum energy filter (gatan), using a slit of ev. movies were recorded in counting mode at a nominal magnification of , x, corresponding to a physical pixel size of . Å. the dose rate was set to . e-/pixel/sec, and the total exposure time was . sec, resulting in an accumulated dose of e-/Å . each movie was split into frames of . sec. the nominal defocus range was - . to - . µm, however, the actual defocus range was larger. imaging was done using an automated low dose procedure implemented in serialem ( ). a single image was collected from the center of each hole using image shift to navigate within hole arrays and stage shift to move between arrays. the ‘multiple record setup’ together with the ‘multiple hole combiner’ dialogs were used to map hole arrays of up to x holes. beam tilt was adjusted to achieve coma-free alignment when applying image shift. cryo-em image processing: image processing was performed using cryosparc software v . . ( ). the processing scheme is outlined in fig. s . a total of acquired movies were subjected to patch motion correction, followed by patch ctf estimation. of these, micrographs having ctf fit resolution better than Å and relative ice thickness lower than . , were selected for further processing. initial particle picking was done using the ‘blob picker’ job on a subset of micrographs. extracted particles were iteratively classified in d and their class averages were used as templates for automated particle picking from all selected micrographs, resulting in , , picked particles. particles were extracted, binned x ( - pixel box size, . Å/pixel), and cleaned by multiple rounds of d classification, resulting in , , particles. these particles were used for ab initio d reconstruction with classes. out of the classes only one, containing , particles, refined to high resolution. two additional classes may show ace in a closed conformation (containing , and , particles), however, they did not refine, partially because of preferred orientation. the d class containing , particles was refined as follows: particles were re-extracted only from micrographs with defocus lower than . µm, binned x , and subjected to homogeneous refinement ( , particles, -pixel box size, . Å/pixel). the particles were then sub-classified into classes, and particles from the higher-resolution class were re-extracted without binning in -pixel boxes, subjected to per particle motion correction, followed by non-uniform refinement ( ) with per-particle defocus optimization. the final map, at a resolution of . Å (fig. s ), was (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . sharpened with a b-factor of - before atomic model building. in the final map, the rbd is only partially resolved at the distal region from the ace interface. to better understand the reason for the missing density, we subjected the particles from the well-refined d class ( , particles) to variability analysis ( ), with a binary mask imposed on the rbd region (fig. s ). classification into distinct classes based on eigenvectors, revealed variable density at the rbd distal region, which could not be modeled reliably. the cryo-em data collection process and refinement statistics are summarized and visualized in fig. s , s , s , and table s . model building: the atomic model of the ace -rbd- was solved by docking into the cryo- em maps the homologous refined structure of the sars-cov- spike receptor-binding domain bound with ace (pdb-id m j) as a model, using the dock-in-map program in phenix ( ). all steps of atomic refinements were carried out with the real-space refinement in phenix ( ). the model was built into the cryo-em map by using the coot program ( ). the ace -rbd- model was evaluated with the molprobidity program ( ). the ace (g -y ) contains one zinc ion linked to h , h , and e and three n-acetyl-β-glucosaminide (nag) glycans linked to n , n , and n . in the rbd- structure (t -k ) three fragments; r -s (β , α ), g -v (α ), and f -h (β ) are disordered, and thus not visible in the electron density map. details of the refinement statistics of the ace -rbd structure are described in table s . d visualization and analyses were performed using ucsf chimera ( ) and pymol (schrödinger, inc.; . . ). analysis of rbd circulating virus variants all amino acid substitutions in the rbd ( ) were downloaded from the gisaid database ( december ) ( ) with the corresponding numbers of sequences and regions and plotted against the binding (Δlog (kd,app)) or expression (Δlog mfi) extracted from the rbd deep mutational scanning dataset ( ). we gratefully acknowledge all gisaid contributors and starr et all for sharing their data. octet red binding analysis octet red system (forte ́bio, pall corp., usa) was used for real-time binding determination. briefly, µg/ml of ace diluted in mm naacetate ph . was immobilization to an amine- reactive g biosensor using standard procedure. the purified rbd was diluted in a sample buffer (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (pbs+ . % bsa+ . % tween ). analyte concentrations, association, and dissociation times were adjusted per sample. data analysis v software (forte ́bio, pall corp., usa) was used for data fitting, with the mathematical model assuming a simple : stoichiometry. pseudo-virus production and inhibition of infection by rbd pseudo-virus production: sars-cov- -spike pseudotyped lentivirus was produced by co- transfection of hek t cells pcmv Δr . , pgipz-gfp, ( ) and pcmv sΔc at a ratio of : : . hours before the transfection x cells were seeded into a cm plate. on the day of the transfection cells were washed by dulbecco's modified eagle's medium (dmem) (gibco ) and ml of opti-mem (gibco ) was added to the plate. µg of plasmids mix was transfect using lipofectamine transfection reagent (thermo fisher ) according to the manufacturer’s instructions. after hours, the media was replaced by ml of fresh media. the supernatant was harvested h post-transfection, centrifuged ( g, min), and filtered to remove all residual debris (millex-hv syringe filter unit, . µm). rbd inhibition assay: hek- t cells stably expressing hace (genscript m ) were seeded into -well plate at an initial density of x cells per well. the following day cells were pre-incubated with serial dilutions of rbds ( h) and then the pseudotyped lentivirus was added. after h, the cell medium was replaced with fresh dmem, and cells were grown for an additional h. after this procedure, cells were harvested and the gfp signal was analyzed by flow cytometry (bd accuri™ c plus flow cytometer, bd biosciences, usa). inhibition of sars-cov- infection the strain -ncov/idf / was supplied by the national reference centre for respiratory viruses hosted by institute pasteur (paris, france) and headed by dr. sylvie van der werf. the human sample from which strain -ncov/idf / was isolated has been provided by dr. x. lescure and pr. y. yazdanpanah from the bichat hospital. the experiments were done by institute pasteur. veroe (c ) cells were grown in dmem with % serum and % penicillin to % confluence in well format and incubated with rbds at given concentration for hrs before . moi of sars-cov- was added for one hour. the inoculum was subsequently removed and a medium with the rbd was added. after hrs of incubation, the supernatant was recovered and viral load was measured using rt-pcr with forward primer: taatcagacaaggaactgatta, reverse primer: cgaaggtgtgacttccatg. in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . parallel, cell viability was assessed after hrs incubation using the celltiter glo kit from promega. raw data are normalized against appropriate negative and positive controls and are expressed as the fraction of virus inhibition. the curve fit was performed using the variable hill slope model of four parameters logistic curve: response = baseline + (max – baseline)/( + ^(logec -log(c)+hill)) ace activity assay human ace activity was evaluated using sensolyte® ace activity assay kit (anaspec; cat# ) according to manufacturer's protocol, with the following changes - assay was performed in well plates with a ratio of : of the recommended volume of buffer, substrate, and inhibitor. the activity was measured on either purified ace ( . ng; abcam, ab ) or on the following cell lines - hela transiently transfected with ace ( cells per assay), hek- t stable transfected with ace (genscript m , cells per assay), caco cells ( , cells per assay). to assess the effect of rbd on ace activity nm wt rbd or rbd-b were added before activity measurement. the activity rate is indicated by the slope of the plot [product/time]. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary text optimizing the rbd domain length for yeast display and protein expression to optimize the rbd for yeast display, we screened multiple different constructs for yeast surface expression. rbds of different starting and termination positions were cloned in a pjydc vector and their impact on expression, stability, and ace binding were determined (table s ). the rbdcon was the shortest construct lacking the last c-terminal loop of the rbd domain ( – ) and including one unpaired cysteine. this resulted in poor expression and binding. the rbdcon and con included this loop, resulting in domain stabilization and an increase both in binding and expression. although rbdcon ( ) construct demonstrated high expression yields both in yeast and expi f™ cells, as well as good thermo-stability, we decided not to use it in yeast display since one unpaired cysteine (c ) is close to its c-terminus and the construct contains part of the neighboring domain. we continued with the rbdcon and rbdcon constructs for yeast display and protein expression in expi f™ cells respectively. supplementary tables table s – comparison of different rbd domains for yeast display and protein expression. construct positiona number of aa size [kda] yeast expression [mean fl * ]b yeast display estimated kd [nm]c melting temperature [°c] rbdcon - . . . ± . rbdcon - . . . ± . . ± . rbdcon b - . . . ± . rbdcon - . . . ± . rbdcon - . . . ± . . ± . a numbers are according to uniprotkb- p dtc b measured in pjydc (eunag fluorescence signal) c binding affinity against ace was determined by facs, with the relevant construct expressed on yeast surface. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table s analysis of mutant clones selected by yeast display. * - the yeast display affinity was determined using different concentrations of ace (scr – see fig. s ) or by full titration curve (full). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table s : cryo-em data collection and refinement statistics of ace -rbd data collection em equipment voltage (kv) detector energy filter pixel size (Å) electron dose (e-/Å ) defocus range (µm) number of collected micrographs number of selected micrographs d reconstruction software number of used particles resolution (Å) symmetry map sharpening b factor (Å ) pdb code refinement software cell dimensions (Å) model composition protein residues atoms sugar zn rmsd bonds length (Å) bonds angle ( ̊) ramachandran plot statistics (%) preferred allowed outlier titan krios (thermo fisher scientific) k (gatan) bioquantum (gatan), ev slit . - . to - . , , cryosprac , . c - xxx phenix . , . . . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . supplementary material figures: fig. s the i f mutation, selected by yeast surface display, increases protein stability and expression. a) the position of i f (bright yellow) mutation in the rbd structure (pdb id m ) and the neighboring residues within Å distance (pale yellow). b) shows the residues involved in the formation of the hydrophobic cavity around i f mutation predicted from the x- ray structure. additional residues that are involved: k , r , s , v , y . inset – the wild-type residue (isoleucine in magenta) overlaid with the phenylalanine mutant. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s gaiting and selection strategies for in vitro evolution of sars-cov- rbd domain. a, b) gating strategy for facs sorting. in the first step, yeast cells are isolated by their fsc-a and ssc-a properties (a). in the second step (b), single cells are isolated by their fsc properties (area and height) on the diagonal plot. the green area represents the gated region. c) selection strategy for affinity maturation. the library was titrated with a range of ace concentrations to select the concentration with limited signal (inset ). under such conditions, the tighter binding clones gain the highest advantage over the parental population. using less stringent selection (insets – ) reduces the advantage of the tighter binders. using too low concentrations of ace protein will also result in loss of selectivity. d) affinity maturation library after sorts, where the separation between parental and tighter binding population is well defined. the top . – . % of cells were sorted – green region. e) fast association selection strategy. the library was incubated with a constant concentration ( pm) of ace for a different times. the time with minimal signal was determined and used for the selection of clones with faster association. the same shape of the sorting region as in (d) was applied. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s evaluating the binding affinity of individual clones, from libraries b and fa. five single-clones were evaluated for binding to ace from each library, to determine the range of affinity maturations after facs selection. each clone was incubated with four (library b ) or six (fa) different concentrations of ace . the binding curve was fitted using additional parameters describing the curve minimum and maximum as determined from the rbd-wt titration curve. calculated affinities are in table s . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s protein purification of ace , rbd- and the complex between the two. both proteins were expressed in expi f cells and secreted to cell culture media. a) sds-page analysis after ninta agarose purification of ace receptor extracellular portion (aa q – s ). b) sds-page analysis from ninta agarose purification of rbd- (aa - ). c) the ace + rbd- complex was purified by gel filtration chromatography column prior to cryoem. ace protein was mixed with an excess of rbd- ( : . ), incubated h on ice, and applied on the chromatography column by using Äkta pure fplc system. the first peak corresponds to the complex (sds-gel inset) and the second peak represents excess rbd- . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s sars-cov- rbd mutations in the population and their expression level. a) relation between the impact of mutations on yeast surface expression and their occurrence in the population. expression was measured as the mean fluorescence intensity (mfi) of the specific clone expressed on the yeast surface by star ret al. ( ) (black and red) or by us (blue, inset). empty squares and black dots are showing data with < or ˃ sequences recorded, respectively. the emerging mutations in the population are shown in red. the graph shows that the variance in expression decreases with higher occurrence in the population. b) relation between the affinity (x-axis), expression (y-axis), and the occurrence in population: empty squares < sequences; black dots ˃ sequences; red dots represent four emerging mutants (all with more than sequences). based on a and b, rapidly spreading mutations have increased ace binding affinity without compromising protein stability. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s site-directed mutagenesis of rbd- , using affinity enhancing mutations. mutations were predicted to enhance rbd-ace binding ( ). these mutations were evaluated for enhancing the affinity of rbd- towards ace . a) impact of mutations, on top of rbd- on ace binding (y-axis) and yeast surface expression. three mutations (orange circles), which have the highest impact on expression, were combined in rbd- (red triangle). b) localization of stabilizing (yellow) and binding enhancing mutations depicted in the rbd structure (pdb id m , best rotamer is shown). c) binding curve of rbd- with rbd- for comparison. d) normalized protein melting curves for rbd-wt, rbd- , rbd- , and rbd- measured using the tycho nt. (nanotemper). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s single-particle cryo-em processing scheme. the details of the process are described in the methods section under “cryo-em image processing”. the number of particles in each map is indicated under the map’s image, along with the map’s resolution where relevant. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s resolution estimate and angular distribution for the ace -rbd- cryo-em map. (a) fourier shell correlation (fsc) curves. (b) angular distribution plot. (c) an alpha-helical segment showing the map density and fitted atomic coordinates. (d) cryo-em map colored according to local resolution estimate. the inset shows a slice through the rbd-ace interface. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s variability analysis of the rbd. (a) particle images from the well-resolved d class were subjected to d variability analysis. (b) central slices through the three eigenimages calculated with a binary mask around the rbd region. (c) five d classes, which were calculated based on the eigenimages. the maps show variable density for the rbd. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s global comparison between rbd-wt and rbd- . a) the rbd- preserves its typical twisted five-stranded antiparallel β sheet (β , β -β , and β ) with an extended insertion containing the short β -β strands, α , and η helices and loops. the biggest differences are pronounced between m and f (black circle). b) the upper part comprised of three segments: r -s (β , α ), g -v (α ), and f -h (β ) is not resolved in the electron density map (blue ribbon, added from pdb id: m j). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s an analysis of conserved positions computed by consurf server depicted on the rbd- structure. the amino acids are colored by their conservation grades with turquoise- through-maroon indicating variable-through-conserved. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. s rbd- mutations are interfering with binding to multiple antibodies. the rbd- (magenta) was structurally overlayed with rbd-wt (white). s n, e k, q r, and n y rbd mutated residues were analyzed for disruptive contacts/clashes with corresponding binding antibody/nanobody (green) in relation to rbd-wt. four examples a) pdb id: yz , b) pdb id: can, c) pdb id: jvb, d) pdb id: che, where rbd- (but not rbd-wt) forms serious clashes with the second chain. further experimental evaluation is needed to support our observation. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fatty acid oxidation participates of the survival to starvation, cell cycle progression and differentiation in the insect stages of trypanosoma cruzi fatty acid oxidation participates of the survival to starvation, cell cycle progression and differentiation in the insect stages of trypanosoma cruzi rodolpho ornitz oliveira souza¹, flávia silva damasceno¹, sabrina marsiccobetre , marc biran , gilson murata , rui curi , frédéric bringaud , ariel mariano silber¹* ¹ university of são paulo, laboratory of biochemistry of tryps – labtryps, department of parasitology, institute of biomedical sciences – são paulo, sp, brazil centre de résonance magnétique des systèmes biologiques (rmsb), université de bordeaux, cnrs umr- , bordeaux, france university of são paulo, department of physiology, institute of biomedical sciences – são paulo, sp, brazil cruzeiro do sul university, interdisciplinary post-graduate program in health sciences - são paulo, sp, brazil laboratoire de microbiologie fondamentale et pathogénicité (mfp), université de bordeaux, cnrs umr- , bordeaux, france *corresponding author e-mail: asilber@usp.br (ams) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / abstract during its complex life cycle, trypanosoma cruzi colonizes different niches in its insect and mammalian hosts. this characteristic determined the types of parasites that adapted to face challenging environmental cues. the primary environmental challenge, particularly in the insect stages, is poor nutrient availability. these t. cruzi stages could be exposed to fatty acids originating from the degradation of the perimicrovillar membrane. in this study, we revisit the metabolic fate of fatty acid breakdown in t. cruzi. herein, we show that during parasite proliferation, the glucose concentration in the medium can regulate the fatty acid metabolism. at the stationary phase, the parasites fully oxidize fatty acids. [u- c]-palmitate can be taken up from the medium, leading to co production via beta-oxidation. lastly, we also show that fatty acids are degraded through beta- oxidation. additionally, through beta-oxidation, electrons are fed directly to oxidative phosphorylation, and acetyl-coa is supplied to the tricarboxylic acid cycle, which can be used to feed other anabolic pathways such as the de novo biosynthesis of fatty acids. author summary trypanosoma cruzi is a protist parasite with a life cycle involving two types of hosts, a vertebrate one (which includes humans, causing chagas disease) and an invertebrate one (kissing bugs, which vectorize the infection among mammals). in both hosts, the parasite faces environmental challenges such as sudden changes in the metabolic composition of the medium in which they develop, severe starvation, osmotic stress and redox imbalance, among others. because kissing bugs feed infrequently in nature, an intriguing aspect of t. cruzi biology (it exclusively inhabits the digestive tube of these insects) is how they subsist during long periods of starvation. in this work, we show that this parasite performs a metabolic switch from glucose consumption to lipid oxidation, and it is able to consume lipids and the lipid-derived fatty acids from both internal origins as well as externally supplied compounds. when fatty acid oxidation is chemically inhibited by etomoxir, a very well-known drug that inhibits the translocation of fatty acids into the mitochondria, the proliferative insect stage of the parasites has dramatically diminished survival under severe metabolic stress and its differentiation into its infective forms is impaired. our findings place fatty acids in the centre of the scene regarding their extraordinary resistance to nutrient-depleted environments. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction t. cruzi, a flagellated parasite, is the causative agent of chagas disease, a neglected health problem endemic to the americas [ ]. the parasite life cycle is complex, alternating between replicative and non-replicative forms in two types of hosts, mammalians and triatomine insects [ ]. in mammalian hosts, two primary forms are recognized: replicative intracellular amastigotes and nondividing trypomastigotes, which are released from infected host cells into the extracellular medium. after being released from infected cells, trypomastigotes can spread the infection by infecting new cells, or they can be ingested by a triatomine bug during its blood meal. once inside the invertebrate host, the ingested trypomastigotes differentiate into epimastigotes, which initiate their proliferation and colonization of the insect digestive tract [ ]. once the epimastigotes reach the final portion of the digestive tube, they initiate differentiation into non-proliferative, infective metacyclic trypomastigotes. these forms will be expelled during a new blood meal and will be able to infect a new vertebrate host [ , – ]. the diversity of environments through which t. cruzi passes during its life cycle (i.e., the digestive tube of the insect vector, the bloodstream and the mammalian cells cytoplasm) subjects it to different levels of nutrient availability [ , ]. therefore, this organism evolved a robust, flexible and efficient metabolism [ , ]. as an example, it was recognized early on that epimastigotes are able to rapidly switch their metabolism, allowing the consumption of carbohydrates and different amino acids [ , ]. several studies identified aspartate, asparagine, glutamate [ ], proline [ – ], histidine [ ], alanine [ , ] and glutamine [ , ] as oxidisable energy sources. despite the quantity of accumulated information on amino acid and carbohydrate consumption, little is known about how t. cruzi uses fatty acids and how these compounds contribute to the parasite´s metabolism and survival. in this study, we explore fatty acid metabolism in t. cruzi. we also address fatty acid regulation by external glucose levels and the involvement of their oxidation in the replication and differentiation of t. cruzi insect stages. methods parasites epimastigotes of t. cruzi strain cl clone were maintained in the exponential growth phase by sub-culturing them for h in liver infusion tryptose (lit) medium at °c [ ]. metacyclic trypomastigotes were obtained through the differentiation of epimastigotes at the stationary growth phase in tau- aag (triatomine artificial urine supplemented with mm proline, mm glutamate, mm aspartate and mm glucose) as previously reported [ ]. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / fatty acid oxidation assays preparation of palmitate-bsa conjugates. sodium palmitate at mm was solubilized in water by heating it up to °c. bsa free fatty acids (ffa bsa) (sigma®) was dissolved in pbs and warmed up to °c with continuous stirring. solubilized palmitate was added to bsa at °c with continuous stirring (for a final concentration of mm in % bsa). the conjugated palmitate-bsa was aliquoted and stored at − °c [ ]. co production from oxidisable carbon sources. to test the production of co from palmitate, glucose or histidine, exponentially growing epimastigotes ( x ml- ) were washed twice in pbs and incubated for different times ( , , and min) in the presence of . mm of palmitate spiked with . µci of c-u-substrates. to trap the produced co , whatman paper was embedded in m koh solution and was placed in the top of the tube. the co trapped by this reaction was quantified by scintillation [ , ]. h-nmr analysis of the exometabolome. epimastigotes ( x ml- ) were collected by centrifugation at , x g for min, washed twice with pbs and incubated in ml (single point analysis) of pbs supplemented with g/l nahco (ph . ). the cells were maintained for h at °c in incubation buffer containing [u- c]-glucose, non-enriched palmitate or no carbon sources. the integrity of the cells during the incubation was checked by microscopic observation. the supernatant ( ml) was collected and µl of maleate solution in d o ( mm) was added as an internal reference. h-nmr spectra were collected at . mhz on a bruker avance iii hd spectrometer equipped with a mm prodigy cryoprobe. the measurements were recorded at °c. the acquisition conditions were as follows: ° flip angle, , hz spectral width, k memory size, and . sec total recycling time. the measurements were performed with scans for a total time of close to min and sec. the resonances of the obtained spectra were integrated and the metabolite concentrations were calculated using the eretic nmr quantification bruker program. oxygen consumption. to evaluate the importance of internal fatty acid sources in o consumption, exponentially growing parasites were treated or not treated with µm eto (the inhibitor of carnitine palmitoyltransferase ), washed twice in pbs and resuspended in mitochondrial cellular respiration (mcr) buffer. the rates of oxygen consumption were measured using intact cells in a high-resolution oxygraph (oxygraph- k; oroboros instruments, innsbruck, austria). oligomycin a ( . µg/ml) and fccp ( . µm) were sequentially added to measure the optimal non-coupled .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / respiration and the respiration leak state, respectively. the data were recorded and treated using datlab software [ , , ]. mitochondrial activity assays mtt and alamar blue. the parasites were washed twice and incubated in pbs supplemented with . mm palmitate in . % ffa bsa, . % ffa bsa alone, and mm glucose, and mm histidine or not supplemented media were used as controls (positives and negative, respectively). the cell viability was evaluated at h and h after incubation using the mtt assay, as described in [ , ]. alamar blue. the parasites were washed twice and incubated in pbs or pbs supplemented with μm eto in -well plates. the plates were maintained at °c during all the experiments. after every h, the cells were incubated with . μg.ml- of alamar blue reagent and kept at °c for h under protection from light. the fluorescence was accessed using the wavelengths λexc = nm and λem = nm in the spectramax® i (molecular devices) plate reader. measurement of intracellular atp content the intracellular atp levels were assessed using a luciferase assay kit (sigma-aldrich ®), as described in [ – ]. in brief, the parasites were incubated in pbs supplemented (or not) with . mm palmitate, . % ffa bsa, mm glucose or mm histidine for h at °c. the atp concentrations were determined by using a calibration curve with atp disodium salt (sigma), and the luminescence at nm was measured as indicated by the manufacturer. enzymatic activities carnitine palmitoyltransferase (cpt ). the epimastigotes were washed twice in pbs ( , x g, min at °c), resuspended in buffered tris-edta ( mm, . mm and . % triton x- ) containing µm phenylmethyl-sulphonyl fluoride (pmsf), . mm n-alpha-p-tosyl-lysyl- chloromethyl ketone (tlck), . mg aprotinin and . mm trans-epoxysuccinyl-l-leucyl amido ( -guanidino) butane (e- ) as a protease inhibitors (sigma aldrich®) and lysed by sonication ( pulses for min each, %). the lysates were clarified by centrifugation at , x g for min at °c. the soluble fraction was collected and the proteins were quantified by bradford method [ ] and adjusted to . mg/ml protein. the reaction mixture contained . mm l-carnitine, . mm palmitoyl-coa and . mm dtnb in tris-edta buffer (ph = . ). the cpt activity was measured spectrophotometrically at nm by dtnb reaction with free hs-coa, forming the tnb- ion. to calculate the specific activity, the absorbance values were converted into molarity by using the tnb- extinction molar coefficient of , m- .s- [ ]. as a blank, we performed the same assay without .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / adding the substrate. all the enzymatic assays were performed in -well plates at a final volume of . ml in the spectramax® i (molecular devices). acetyl-coa carboxylase (acc). the acc activity was measured spectrophotometrically by coupling its enzymatic reaction with that of citrate synthase (cs), which uses oxaloacetate and acetyl- coa to produce citrate. measurements were performed at the end-points in two steps. first, the reaction mixture contained mm potassium phosphate buffer (ph = . ), mm khco , mm mncl , mm atp, mm acetyl-coa and . μm biotin. the reaction was initiated by adding . mg of cell extract and developed using min incubations at °c. the reaction was stopped by adding perchloric acid % (v/v) and centrifuged , x g for min at °c. the second reaction was performed by using . ml of the supernatant from the first reaction, mm oxaloacetate and . mm of dtnb in mm potassium phosphate buffer (ph = . ). the reaction was initiated by adding . units of cs (sigma aldrich©). to calculate the specific activity of acc, we converted the absorbance values to molarity by using the tnb- extinction molar coefficient of , m- .s- . for the blank reaction, we performed the same assay without acetyl-coa [ ]. hexokinase (hk). the hk activity was measured as described in [ ]. briefly, the activity was measured by coupling the hexokinase activity with a commercial glucose- -phosphate dehydrogenase, which oxidizes the glucose- -phosphate (g pd, sigma) resulting from the hk activity with the concomitant reduction of nadp+ to nadph. the resulting nadph was spectrophotometrically monitored at nm. the reaction mixture contained mm triethanolamine buffer ph . , mm mgcl , mm kcl, mm glucose, mm atp and u of commercial g pd. to calculate the specific activity, the absorbance values were converted to molarity using the nadp(h) extinction molar coefficient of , m- .s- . serine palmitoyltransferase (spt). the spt activity was measured through the reduction of the dtnb reaction by the free hs-coa, forming the tnb- ion, which was measured spectrophotometrically at nm as previously described [ ]. in brief, the epimastigotes were washed twice in pbs, resuspended in tris-edta buffer ( mm/ . mm) containing triton x- . % and lysed by sonication ( % of potency, during min). the reaction mixture contained . mg of protein free-cell extract, . mm l-serine, . mm palmitoyl-coa and . mm dtnb in tris- edta buffer ( mm/ . mm) ph = . [ ]. to calculate the specific activity, we converted the absorbance values to molarity using the tnb- extinction molar coefficient of , m- .s- . for the blank reaction, we performed the same assay without adding palmitoyl-coa. all the enzymatic assays were performed in -well plates in a final volume of . ml in the spectramax® i (molecular devices). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / glucose and triglyceride quantification spent lit medium from epimastigote cultures was collected by recovering the supernatants from a centrifugation ( , x g for min at °c). each sample of spent lit was analysed for its glucose and triglyceride contents using commercial kits (triglyceride monoreagent and glucose monoreagent by bioclin brazil) according to the manufacturer’s instructions. these kits are based on colorimetric enzymatic reactions, and the absorbance of each assay was measured in -well plates at a final volume of . ml in the spectramax® i (molecular devices). proliferation assays exponentially growing t. cruzi epimastigotes ( x ml- ) were treated with different concentrations of eto or not treated (negative control) in lit medium. as a positive control for growth inhibition, we used a combination of rotenone ( µm) and antimycin ( . µm) [ ]. the parasites ( . x ml- ) were transferred to -well plates and then incubated at °c. the cell proliferation was quantified by reading the optical density (od) at nm for eight days. the od values were converted to cell numbers using a linear regression equation previously obtained under the same conditions. each experiment was performed in quadruplicate [ ]. flow cytometry analyses cell death. epimastigotes in the exponential phase of growth were maintained in lit and treated with eto µm for days. after the incubation time, the parasites were analysed as described in [ ]. the cells were analysed by flow cytometry (facscalibur bd biosciences). cell cycle (dna content). epimastigotes in the exponential phase of growth were maintained in lit and treated with eto µm over days. after the incubation time, the parasites were washed twice in pbs and resuspended in lysis buffer (phosphate buffer na hpo . mm; kh po . mm; ph = . ) and digitonin µm. after incubating on ice for min, propidium iodide . μg/ml was added. the samples were analysed by flow cytometry (guava) adapted from [ ]. fatty acid staining using bodipy® / . exponentially growing epimastigotes were kept in lit medium to reach three different cell densities ( . x ml- , x ml- and ml- ) in - well plates at °c. twenty-four hours before the flow cytometry analysis, the parasites were treated with µm c -bodipy® / -c . this fluorophore allows for measurements of the relationship between fatty acid accumulation and consumption by shifting the fluorescence filter. the samples were collected, washed twice in pbs and incubated in % paraformaldehyde for min. after .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / incubation, the cells were washed twice with pbs and suspended in the same buffer. flow cytometry analysis was performed with fl- and fl- filters in a facs fortessa db®. the results were analysed using flojo software. fluorescence microscopy the parasites were maintained in lit medium as previously reported for fatty acid staining using bodipy® / . after incubation, the cells were washed twice in pbs and placed on glass slides. the images were acquired with a digital dfc fx camera coupled to a dmi b/af microscope (leica). the images were analysed using imagej software. results palmitate supports atp synthesis in t. cruzi we initially investigated the ability of t. cruzi epimastigotes to oxidize fatty acids. to this end, we used palmitate as a proxy for fatty acids in general. the parasites were incubated with . mm c-[u]-palmitate, which allowed us to measure the production of . nmoles of co derived from palmitate oxidation during the first min and . extra nmoles during the following min (fig a). this finding indicated that beta-oxidation and the further ‘burning’ of the resulting acetyl- coa is operative in epimastigote mitochondria. because palmitate is taken up from extracellular medium and oxidized to co , it is reasonable to assume that it could contribute to resistance to severe nutritional stress. to support this idea, we tested the ability of palmitate to extend parasite survival under extreme nutritional stress. parasites were incubated for and h in pbs (negative control, in this condition we expected the lower viability after the incubations), . mm palmitate in pbs supplemented with bsa (as a palmitate carrier), . mm histidine in pbs or . mm glucose in pbs (both positive controls, since it is well knowing the ability of both metabolites to extend the parasites´ viability in metabolic stress conditions, see [ ]). as an additional negative control, we used pbs supplemented with bsa without added palmitate. the viability of these cells was assayed by measuring the total reductive activity by mtt assay. additionally, we measured the total atp levels. cells incubated in the presence of palmitate showed higher viability than the negative controls, but not as high as that of parasites incubated with glucose or histidine (fig b). consistently, parasites incubated in the presence of palmitate showed higher atp contents than both negative controls. however, the intracellular atp levels in the cells incubated with palmitate were diminished by half when compared to parasites incubated with histidine. interestingly, the palmitate kept the atp content at levels comparable to glucose (fig c). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . palmitate oxidation promotes atp production and viability in epimastigote forms under starvation. schematic representation of c-u-palmitate metabolism. the metabolites corresponding to labelled palmitate metabolism are presented in green. a) co production from epimastigotes incubated in pbs with c-u-palmitate µm. the co was captured at , , and min. b) viability of epimastigote forms after incubation with different carbon sources and palmitate. the viability was assessed after and h by mtt assay. c) the intracellular atp content was evaluated following incubation with different energy substrates or not (pbs, negative control). the atp concentration was determined by luciferase assay and the data were adjusted by the number of cells. a statistical analysis was performed with one-way anova followed by tukey's post-test at p < . using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; * p value < . . for a p value > . we consider the differences to be not significant (ns). epimastigote forms excrete acetate as a primary end-product of palmitate oxidation because the epimastigotes were able to oxidize c-u-palmitate to co , we were interested in analysing their exometabolome and comparing it with that of parasites exclusively consuming glucose, palmitate or without any carbon source. thus, we subjected exponentially growing parasites to h of starvation and then incubated them for h in the presence of . mm palmitate, mm c-u-glucose or without any carbon source. for the control, we analysed a sample of non-starved parasites. after the incubations, the extracellular media were collected and analysed by h-nmr spectrometry. as expected, all the incubation conditions produced different flux profiles for excreted metabolites (fig and s fig). under our experimental conditions, the non-starved parasites primarily excreted succinate and acetate in similar quantities, and alanine and lactate to a lesser extent. parasites starved for h in pbs and left to incubate in the absence of other metabolites had diminished succinate production (~ -fold) but increased acetate production three-fold compared to the non-starved parasites. it is relevant to stress that the only possible origin for these metabolites are internal carbon sources (ics). notably, no other excreted metabolites were detected under these conditions, indicating that under starvation, most of the ics are transformed into acetate as an end product, which is compatible with the oxidation of internal fatty acids. these results raise the question about the metabolic fates of glucose or fatty acids in previously starved parasites. starved epimastigotes that recovered in the presence of glucose exhibited a profuse excretion of succinate ( -fold the quantity excreted by the starved cells) and roughly equivalent quantities of acetate compared with the starved cells. interestingly, lactate and alanine were also excreted at similar levels. as expected, the recovery with glucose produced an increase in all the secreted metabolites. however, analysing their distribution is a reconfiguration of the metabolism towards a majority production of succinate. finally, in epimastigotes incubated with palmitate, we observed an increase in the acetate and alanine production of approximately . times to the levels in parasites that recovered in the presence of glucose. interestingly, succinate is excreted in a smaller quantity than acetate and alanine, but still at -fold the rate observed in the starved non-recovered cells. surprisingly, there was also a .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / significant production of pyruvate (not previously described in the literature, and not observed under any other conditions) and a small amount of lactate derived from palmitate. figure . excreted end products of glucose and palmitate metabolism in epimastigote forms of t. cruzi. a) the extracellular medium of epimastigote forms incubated under different conditions was analysed by h-nmr spectrometry to detect and quantify the end-products. the resulting data were expressed in nmoles/h/ cells. means ± sd of three independent experiments. ics is internal carbon sources; nd is non-detectable. b) and c) schematic representation of the contribution of glucose and palmitate to the metabolism of epimastigote forms of t. cruzi. the glycosomal compartment and tca cycle are indicated. the amount of end-product determined by the font size. numbers indicates enzymatic steps. . glycolysis; . pyruvate dehydrogenase; . citrate synthase; . aconitase; . isocitrate dehydrogenase; . α-ketoglutarate dehydrogenase; . succinyl-coa synthetase; . succinate dehydrogenase/complex ii/fumarate reductase nadh-dependent; . fumarate hydratase; . malate dehydrogenase; . malic enzyme; . alanine dehydrogenase/alanine aminotransferase; . lactate dehydrogenase; . acetate:succinyl-coa transferase; . acetyl-coa hydrolase; . succinyl-coa synthetase; . glycosomal fumarate reductase and . palmitate oxidation by beta-oxidation, resulting in fadh , nadh and acetyl-coa; abbreviations: cit: citrate, aco: aconitate, isoc: isocitrate, α-kg: α-ketoglutarate, suc-coa: succinyl-coa, suc: succinate, fum: fumarate, mal: malate, and oxa: oxaloacetate. glucose metabolism represses the fatty acid oxidation in epimastigotes glucose is the primary carbon source for exponentially proliferating epimastigotes, and after its exhaustion from the culture medium, the parasites change their metabolism to use amino acids as carbon sources preferentially [ ]. therefore, we were interested in analysing if this preference for glucose is maintained in relation to the consumption of lipids. to determine if glucose metabolism interferes with the consumption of fatty acids, we created a h proliferation curve using parasites with an initial concentration adjusted to . x ml- and quantified them for h each. under these conditions, the parasites from the beginning of the experiment, at h, are at mid-exponential phase, they are at late exponential phase at h, and at h they reached stationary phase at a concentration of x ml- (fig a). at h, h and h, the culture medium was collected to measure the remaining glucose and triacylglycerol (tags) concentrations (figs b and c). most of the glucose was consumed during the first h (during proliferation), while the concentration of tags remained the same. after h of proliferation (stationary phase), the tag levels and lipid contents of the droplets were decreased by . -fold and -fold, respectively, suggesting that glucose is preferentially consumed relative to fatty acids. these data show a decrease in the extracellular tags between and h, while the glucose was already almost entirely consumed, suggesting that glucose is negatively regulating the fatty acid catabolism. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . changes in glucose and triacylglycerol contents in lit medium. a) growth curve of epimastigote forms. b) glucose quantification over h. c) triacylglycerol levels over h. in each experiment, we collected each medium at different times and subjected it to quantification according to the manufacturer's instructions. all the experiments were performed in triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test p < . using the graphpad prism . . software program. we represent the levels of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p value > . , we consider the differences not significant (ns). epimastigote forms use endogenous fatty acids to support growth after glucose exhaustion from the previous results, we learned that under glucose deprivation, tags are taken up by the epimastigotes, and internally stored fatty acids are mobilized. however, to date, we did not provide any evidence pointing to their use as reduced carbon sources. to confirm this idea, exponentially proliferating epimastigotes were incubated in pbs supplemented with palmitate and c-u-glucose, or reciprocally, glucose and c-u-palmitate. in both cases, the production of c- labelled co was quantified. the presence of mm glucose diminished the release of co from c-u-palmitate by % while the presence of palmitate did not interfere with the production of co from c-u-glucose (fig ). taken together, our results show that glucose inhibits tags and fatty acid consumption, and after glucose exhaustion, a metabolic switch occurs towards the oxidation of internally stored fatty acids. figure . glucose metabolism inhibits fao. parasites were incubated in the presence of c-u- palmitate + mm glucose and c-u-glucose + . mm palmitate in pbs. co production from epimastigotes incubated in pbs. the co was captured after min of incubation. the experiments were performed in triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test p < . using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p value > . , we consider the differences not significant (ns). to monitor the dynamics of use or accumulation of fatty acids in lipid droplets, we used as a probe a fluorescent fatty acid analogue called bodipy / c -c . bodipy shifts its fluorescence from red to green upon the uptake and catabolism of fatty acids, and from green to red when fatty acids are accumulated in the lipid droplets. parasites collected at the mid and late exponential proliferation phases and the stationary phase were incubated with μm bodipy / c -c for h, before fluorescence determination by flow cytometry (figs a, b and c). the fluorescence values increased with the harvesting time (and therefore, with the glucose depletion), indicating the increased uptake and use of fatty acids as substrates by a fatty acyl-coa synthetase. these data were confirmed by fluorescence microscopy (fig d). interestingly, parasites in stationary phase showed an accumulation of activated fatty acids in spots along the cell. however, the number of lipid droplets increased upon parasite proliferation (figs a, b c). this observation .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / indicates that not only fatty acids metabolism is activated after glucose exhaustion, but also the parasite storage of fatty acids into lipid droplets. figure . flow cytometry reveals distinct patterns in fatty acid pools during epimastigote growth. the epimastigotes were treated with µm of bodipy c -c ( / ) and analysed by flow cytometry and fluorescence microscopy. a) h. b) h. c) h. in the flow cytometry histograms, dashed peaks represent unstained parasites. green-filled peaks represent stained parasites. d) mean fluorescence per cell. the fluorescence for each cell was calculated using imagej software. all the experiments were performed in triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test p < . using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p value > . , we consider the differences not significant (ns). figure . epimastigote forms accumulates fatty acids into lipid droplets during growth. the epimastigotes were treated with µm bodipy c -c ( / ) and analysed by flow cytometry and fluorescence microscopy. a) h. b) h. c) h. in the flow cytometry histograms, dashed peaks represent unstained parasites. yellow filled peaks represent positively stained parasites. the number of green/yellow spots for each cell was calculated using imagej software. all the experiments were performed in triplicates. to find if the increase in fatty acid pools is accompanied by a change in the levels of enzymes related to fatty acid metabolism, we evaluated the specific activities of the enzymes hexokinase (hk), which is responsible for the initial step of glycolysis and an indicator of active glycolysis; acetyl-coa carboxylase (acc), which produces malonyl-coa for fatty acid synthesis and carnitine palmitoyltransferase (cpt ), the complex that plays a central role in fatty acid oxidation (fao) by controlling the entrance of long-chain fatty acids into the mitochondria [ ]. for the control, we selected the enzyme serine palmitoyltransferase (spt ), a constitutively expressed protein in t. cruzi [ ] (fig ). the hexokinase activity diminished up to % with the progression of the proliferation curve and the correlated depletion of glucose (fig a). in addition, the acc activity is no more detectable in the stationary phase cells (fig b). by contrast, the cpt activity is increased by ~ -fold when the stationary phase is reached (fig c), which confirms that fatty acid degradation occurs in the absence of glucose. it is noteworthy that the high levels of acc activity in the presence of glucose supports the idea that under these conditions, fatty acids are probably synthesized instead of being catabolized. as expected, spt did not change during the analysed time frame (fig d). figure . activities of enzymes related to lipid and glucose metabolism during t. cruzi growth curves. a) (hk) hexokinase b) (acc) acetyl-coa carboxylase, c) (cpt ) carnitine- palmitoyltransferase, and d) (spt) serine palmitoyltransferase. all these activities were measured in .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / crude extracts from epimastigote forms at different moments of the growth curve. all the experiments were performed in triplicates. time course activities and controls shown in fig s . statistical analysis was performed with one-way anova followed by tukey's post-test at p < . , using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p value > . we consider the differences not significant (ns). etomoxir, a cpt inhibitor, affects t. cruzi proliferation and mitochondrial activity to investigate the role of fao in t. cruzi, we tested the effect of a well characterized inhibitor of cpt , etomoxir (eto), on the proliferation of epimastigotes. among the eto concentrations tested here (from . to µm), only the higher concentration arrested parasite proliferation (fig a). importantly, the eto effect was manifested when the parasites reached the late exponential phase (a cell density of approximately x ml- ). this result is consistent with our previous findings showing that fao (and thus cpt activity) acquires an important role at this point in the proliferation curve. to confirm that cpt is in fact a target of eto in t. cruzi, we assayed the drug's effect on the enzyme activity in free cell extracts. our results showed that µm eto diminished the cpt activity by almost % (fig b). to confirm the interference of eto with the beta- oxidation of fatty acids, parasites incubated in pbs containing c-u-palmitate were treated with µm eto to compare their production of co with that of the untreated controls. palmitate-derived co production diminished by % in eto-treated cells compared to untreated parasites (fig c). in addition, eto treatment did not affect the metabolism of c-u-glucose or c-u-histidine, ruling out a possible unspecific reaction of this drug with coa-sh as described by [ ]. other compounds described as fao inhibitors were also tested, but none of them inhibited epimastigote proliferation or co production from c-u-palmitate (s fig). in addition, the bodipy cytometric analysis of cells treated with µm eto showed a strong decrease in the coa acylation levels (activation of fatty acids) with respect to the untreated controls (fig d), as confirmed by fluorescence microscopy (fig d). to reinforce the validation of eto for further experiments, a set of controls are offered in s fig. our preliminary conclusion is that eto inhibited beta-oxidation by inhibiting cpt , confirming that the breakdown of fatty acids is important to proliferation progression in the absence of glucose. figure . eto inhibits cpt and interferes with cell proliferation in epimastigote forms. (a) proliferation of epimastigote forms in the presence of . to µm eto. for the positive control of dead cells, a combination of antimycin ( . µm) and rotenone ( µm) was used. (b) inhibition of cpt activity in crude extracts using and µm of eto. c) co capture from c-u- palmitate oxidation. d) flow cytometry analysis and fluorescence microscopy of epimastigote forms treated (or not) with eto. in the histograms, dashed peaks represent unstained parasites and green- filled peaks represent parasites stained with bodipy c -c . all the experiments were performed in .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test at p < . using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p values > . , we consider the differences not significant (ns). etomoxir treatment affects cell cycle progression the metabolic interference of eto diminished epimastigote proliferation; however, this finding could be due to a decrease in the parasite proliferation rate or an increase in the death rate. therefore, we checked if this compound could induce cell death through programmed cell death (pcd) or necrosis. pcd is characterized by biochemical and morphological events such as exposure to phosphatidylserine, dna fragmentation, decreases (or increases) in the atp levels, and increases in reactive oxygen species (ros), among others [ ]. the parasites were treated with µm of eto for days, followed by incubation with propidium iodide (pi) for cell membrane integrity analysis and annexin-v fitc to evaluate the phosphatidylserine exposure. parasites treated with eto showed negative results for necrosis or programmed cell death markers (fig a), indicating that the cell proliferation was arrested but cell viability was maintained. because the multiplication rates seemed to be diminished, we performed a cell cycle analysis. noticeably, the treated parasites were enriched in g ( . %) with respect to non-treated cells ( . %), suggesting that eto prevented the entry of epimastigotes into the s phase of the cell cycle (fig b). last, we noticed that after washing out the eto, the parasites recovered their proliferation at rates comparable to our untreated controls (figs c). figure . analysis of extracellular phosphatidylserine exposure, membrane integrity and cell cycle after eto treatment. parasites in the exponential growth phase were treated with µm of eto for days. (a) following the incubation period, the parasites were labelled with propidium iodide (pi) and annexin v-fitc (anx) and analysed by flow cytometry. (b) the cell cycle was assessed using pi staining. (c) growth curves of epimastigote forms before and after removing the treatment. all the experiments were performed in triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test p < . , using the graphpad prism . . software program. we represent the level of statistical significance in this figure as follows: *** p value < . ; ** p value < . ; and * p value < . . for p values > . , we consider the differences not significant (ns). inhibition of fao by eto affects energy metabolism, impairing the consumption of endogenous fatty acids the evidence obtained to date suggests that parasites resist metabolic stress by mobilizing and consuming stored fatty acids. therefore, it is reasonable to hypothesize that eto, which blocks the mobilization of fatty acids into the mitochondria for oxidation, probably perturbs the atp levels in .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / late-exponential or stationary phase cells. parasites growing for days under µm eto treatment or no treatment were collected to evaluate the ability of parasites that were treated or not with eto to trigger oxygen consumption. the rates of o consumption corresponding to basal respiration were measured in cells resuspended in mcr respiration buffer. we then measured the leak respiration by inhibiting the atp synthase with oligomycin a. finally, to measure the maximum capacity of the electron transport system (ets), we used the uncoupler fccp [ ]. our results demonstrate that compared to no treatment, eto treatment diminishes the rate of basal oxygen consumption, the leak respiration and the ets capacity. in general, respiratory rates diminished in parasites treated with eto when compared to the untreated ones. as expected, eto treatment led to a % decrease in the levels of total intracellular atp compared to untreated parasites (fig a). to complement this result, because all these experiments were conducted in the complete absence of an oxidizable external metabolite, our results show that the parasite is able to oxidize internal metabolites (figs b and c). taking into account that treating parasites with eto diminished the basal respiration rates of these parasites by approximately one-half (figs b and c), it is reasonable to conclude that a relevant part of the respiration in the absence of external oxidisable metabolites is based on the consumption of internal lipids. this is consistent with the confirmation that epimastigotes maintain their viability in the presence of non-fatty acid carbon sources in the presence of eto (s fig). in summary, these results confirm that eto is interfering with atp synthesis through oxidative phosphorylation in epimastigote forms. figure . effects of eto on respiration and atp production in epimastigote forms of t. cruzi. (a) oxygen consumption of epimastigote forms after normal growth in lit medium. (b) oxygen consumption after eto μm treatment. parasite growth in lit medium with the compound until the th day. in black, a time-course register of the concentration (pmols) of o in the respiration chamber. in blue, negative of the concentration derivative (pmols) of o with respect to time (velocity of o consumption in pmoles per second). the parasites were washed twice in pbs and kept in mrc buffer at °c during the assays (see materials and methods for more details). (c) the basal respiration (initial oxygen flux values, mrc), respiration leak after the sequential addition of . µg/ml of oligomycin a ( µg/ml), and electron transfer system (ets) capacity after the sequential addition of . µm fccp ( µm) were measured for each condition. (d) intracellular levels of atp after treating with µm eto. the intracellular atp content was assessed following incubation with different energy substrates or not (pbs, negative control). the atp concentration was determined by luciferase assay and the data were adjusted by the number of cells. all the experiments were performed in triplicates. statistical analysis was performed with one-way anova followed by tukey's post-test at p < . using graphpad prism . . software. we represent the level of statistical significance as follows: *** p value < . ; ** p value < . ; and * p value < . . for p values > . , we consider the differences not significant (ns). endogenous fatty acids contribute to long-term starvation resistance in epimastigote forms .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / as previously demonstrated, eto interferes with the consumption of endogenous fatty acids, and this impairment causes atp depletion and cell cycle arrest. one intriguing characteristic of the insect stages of t. cruzi is their resistance to starvation. to observe the importance of internal fatty acids in this process, we incubated epimastigotes in pbs in the presence (or absence) of μm eto. the mitochondrial activity of these cells was followed for h with alamar blue®. our results showed that the mitochondrial activity of the parasites in the presence of eto was reduced by % after h of starvation, and % after h of starvation (fig. ) compared to the controls (untreated parasites). these data confirmed our hypothesis that the breakdown of accumulated fatty acids partially contributes to the resistance of the parasite under severe starvation. figure . internal fatty acid consumption contributes to parasite viability under severe nutritional starvation. viability of epimastigote forms after incubation in pbs with or without eto. the viability was assessed every h using alamar blue®. statistical analysis was performed with one-way anova followed by tukey's post-test p < . using graphpad prism . . software. we represent the levels of statistical significance as follow: *** p value < . , and for p values > . , we consider the differences not significant (ns). inhibition of cpt impairs metacyclogenesis considering that the fao increases in the epimastigotes during the stationary phase, and that differentiation into infective metacyclic trypomastigotes (metacyclogenesis) is triggered in the stationary phase of epimastigote parasites, one might expect a possible relationship between the consumption of fatty acids and metacyclogenesis. to approach this possibility, we initially compared the cpt activity of stationary epimastigote forms before and after a h incubation in the differentiation medium tau- aag. as observed, there is an increase in cpt activity after submitting the parasites to the metacyclogenesis in vitro (fig. a). parasites were then submitted to differentiation with tau- aag medium in the presence of the probe bodipy. the probe was incorporated into lipid droplets, confirming that fatty acids metabolism was active during the beginning of metacyclogenesis (fig b). to address the importance of fao during differentiation, metacyclogenesis was induced in vitro on eto-treated or untreated (control) parasites. eto treatment interfered with differentiation, diminishing the number of metacyclic forms present in the culture (fig c). in addition, this inhibition was dose-dependent, with an ic = + . µm (fig d). importantly, we ruled out that the variation found in the differentiation rates was due to a selective death of treated epimastigotes, since their survival during this experiment in the presence or absence of eto (from to µm) was not significantly different (s fig). based on these data, we could conclude that fatty acid oxidation, at the level of the cpt , was also participating in the regulation of metacyclogenesis. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . eto inhibits metacyclogenesis. a) cpt activity of epimastigote forms in stationary phase and h after incubated in tau- aag medium (for triggering metacyclogenesis). b) fluorescence microscopy of cells incubated in tau- aag in the presence of bodipy® - c -c . c) effects of different eto concentrations on metacyclogenesis. the differentiation was evaluated by counting the cells in a neubauer chamber each day for days. this experiment was performed in triplicate. d) percentage of differentiation at the th day of differentiation. inset: ic of metacyclogenesis inhibition by eto. the enzymatic activities were measured in duplicate. all the other experiments were performed in triplicates. discussion during the journey of t. cruzi inside the insect vector, the glucose levels decrease rapidly after each blood meal [ ], leaving the parasite exposed to an environment rich in amino and fatty acids in the digestive tube of rhodnius prolixus [ , ]. because the digestive tract of triatomine insects possesses a perimicrovillar membrane, which is composed primarily of lipids and is enriched by glycoproteins [ ], it has been speculated that its degradation could provide lipids for parasite metabolism [ ]. in this study, we showed that the insect stages of t. cruzi coordinate the activation of fatty acid consumption with the metabolism of glucose. our experiments corroborate early studies about the relatively slow use of palmitate as an energy source by proliferating epimastigotes [ , ]. in addition, our results shed light on the end product excretion by epimastigote forms during incubation under starvation conditions, and during their recovery from starvation using glucose or palmitate. first, we showed that non-starved and starved parasites recovered in the presence of glucose, excreting succinate as their primary metabolic waste, as expected [ – ]. after h of nutritional starvation, the consumption of internal carbon sources produces acetate as the primary end-product. in the presence of glucose after h of starvation, we found that glucose-derived carbons contribute to the excreted pools of acetate and lactate. interestingly, palmitate metabolism contributed to the increase in acetate production, followed by the production of alanine, pyruvate, succinate and lactate. the unexpected production of alanine, pyruvate and lactate can be explained by an increase in the tca cycle activity, producing malate, which can be converted into pyruvate by the decarboxylative reaction of the malic enzyme (me) [ ]. pyruvate can be converted into alanine through a transamination reaction by an alanine- [ ], a tyrosine- [ ] an aspartate aminotransferase [ ], or a reductive amination by an alanine dehydrogenase [ ]. the excretion of lactate could be a consequence of lactate dehydrogenase activity. however, it should be noted that this enzymatic activity has not been observed to date. in relation to the succinate production, a relevant factor favouring this process is the production of nadh by the third step of the beta-oxidation ( - hydroxyacyl-coa dehydrogenase). this nadh can be oxidized through the activity of nadh- .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / dependent mitochondrial fumarate reductase [ ], which concomitantly converts nadh into nad+ and fumarate into succinate. this succinate can be excreted or re-used by the tca cycle, and the resulting nad+ can be used as a cofactor for other enzymes. as previously mentioned, it is well known that during the initial phase of proliferation, epimastigotes preferentially consume glucose, and during the stationary phase, a metabolic switch occurs towards the consumption of amino acids [ , , ]. our results show that this switch constitutes a broader and more systemic metabolic reprogramming, which also includes fao. we detected this switch through changes in the enzymatic activities of key enzymes responsible for the regulation of fao, such as cpt and acc, which have increased and decreased activities, respectively, in the presence of glucose. our findings showed that the inhibition of cpt affects the late phase of proliferation of epimastigotes when the switch to fao has already occurred. an interesting question about t. cruzi epimastigotes is how they survive long periods of starvation. early data showed high respiration levels in epimastigotes incubated in the absence of external oxidisable carbon sources. this oxygen consumption was attributed to the breakdown of tags into free fatty acids and their further oxidation [ ]. here, we confirmed this finding by inhibiting the internal fatty acid consumption, which in turn diminished the oxidative phosphorylation activity, internal atp levels and the total reductive activity of parasites under severe nutritional stress. even more notably, we showed that under these conditions, the lipids stored in lipid droplets [ , ] are consumed. unlike what has been observed in procyclic forms of t. brucei, in which the function of lipid droplets is not clear [ ], our results show that in t. cruzi, they are committed to epimastigote survival under extreme metabolic stress. of course, the contribution of other metabolic sources and processes such as autophagy in coping with nutritional stress cannot be ruled out [ ]. multiple metabolic factors has been involved in metacyclogenesis, such as the proline, aspartate, glutamate [ ], glutamine [ ] and lipids present in the triatomine digestive tract [ ]. interestingly, the occurrence of metacyclic trypomastigotes in culture leads to an increase in co production from labelled palmitate [ ]. the eto treatment inhibited metacyclogenesis in vitro, showing that the consumption of internal fatty acids is important for cell differentiation. consequently, we propose that lipids are not only external signals of metacyclogenesis, as previously suggested [ ], but they also have a central role in the bioenergetics of metacyclogenesis. as in the oxidation of several amino acids, the acetyl-coa produced from beta-oxidation and probably the reduced cofactors resulting from these processes are contributing to the mitochondrial atp production necessary to support this differentiation step. in conclusion, fatty acids are important carbon sources for t. cruzi epimastigotes in the absence of glucose. palmitate can be taken up by the cells and fuel the tca cycle by producing acetyl-coa, the oxidation of which generates co . however, in the absence of external carbon .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / sources, lipid droplets become the primary sources of fatty acids, helping the organism to survive nutritional stress. importantly, fao supports endogenous respiration rates and atp production and powers metacyclogenesis. acknowledgements we thank the core facility for scientific research at the university of sao paulo (cefap- usp/fluir) for the flow cytometry analysis and dr. mauro javier veliz cortez (department of parasitology, icb-usp) for the microscopy work. we thank the core facility for scientific research at the university of sao paulo (cefap-usp/fluir) for the flow cytometry analysis and dr. mauro javier veliz cortez (department of parasitology, icb-usp) for the microscopy support. references [ ] who | chagas disease (american trypanosomiasis), who. ( ). https://www.who.int/chagas/en/ (accessed january , ). [ ] j.a. perez-molina, i. molina, chagas disease, lancet. ( ) – . https://doi.org/ . /s - ( ) - . [ ] r. de f.p. melo, a.a. guarneri, a.m. silber, the influence of environmental cues on the development of trypanosoma cruzi in triatominae vector, front. cell. infect. microbiol. ( ) . https://doi.org/ . /fcimb. . . [ ] w. de souza, basic cell biology of trypanosoma cruzi. curr. pharm. des. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ . [ ] p. lisvane silva, b.s. mantilla, m.j. barison, c. wrenger, a.m. silber, the uniqueness of the trypanosoma cruzi mitochondrion: opportunities to identify new drug target for the treatment of chagas disease, curr pharm des. ( ) – . https://www.ncbi.nlm.nih.gov/pubmed/ . [ ] c. bern, chagas’ disease, n engl j med. ( ) . https://doi.org/ . /nejmc .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] y. li, s. shah-simpson, k. okrah, a.t. belew, j. choi, k.l. caradonna, p. padmanabhan, d.m. ndegwa, m.r. temanni, h. corrada bravo, n.m. el-sayed, b.a. burleigh, transcriptome remodeling in trypanosoma cruzi and human cells during intracellular infection, plos pathog. ( ) e . https://doi.org/ . /journal.ppat. . [ ] l. marchese, j. nascimento, f. damasceno, f. bringaud, p. michels, a. silber, the uptake and metabolism of amino acids, and their unique role in the biology of pathogenic trypanosomatids, pathogens. ( ) . https://doi.org/ . /pathogens . [ ] j.j. cazzulo, energy metabolism in trypanosoma cruzi, subcell biochem. ( ) – . https://www.ncbi.nlm.nih.gov/pubmed/ . [ ] m.j. barison, l.n. rapado, e.f. merino, e.m. furusho pral, b.s. mantilla, l. marchese, c. nowicki, a.m. silber, m.b. cassera, metabolomic profiling reveals a finely tuned, starvation-induced metabolic switch in trypanosoma cruzi epimastigotes, j biol chem. ( ) – . https://doi.org/ . /jbc.m . . [ ] r. zeledon, comparative physiological studies on four species of hemoflagellates in culture. ii. effect of carbohydrates and related substances and some amino compounds on the respiration, j. parasitol. ( ) . https://doi.org/ . / . [ ] d. sylvester, s.m. krassner, proline metabolism in trypanosoma cruzi epimastigotes, comp biochem physiol b. ( ) – . https://www.ncbi.nlm.nih.gov/pubmed/ . [ ] l.s. paes, b. suarez mantilla, f.m. zimbres, e.m. pral, p. diogo de melo, e.b. tahara, a.j. kowaltowski, m.c. elias, a.m. silber, proline dehydrogenase regulates redox state and respiratory metabolism in trypanosoma cruzi, plos one. ( ) e . https://doi.org/ . /journal.pone. . [ ] b.s. mantilla, l.s. paes, e.m.f. pral, d.e. martil, o.h. thiemann, p. fernández-silva, e.l. bastos, a.m. silber, role of Δ -pyrroline- -carboxylate dehydrogenase supports mitochondrial metabolism and host-cell invasion of trypanosoma cruzi., j. biol. chem. ( ). https://doi.org/ . /jbc.m . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] m.j. barisón, f.s. damasceno, b.s. mantilla, a.m. silber, the active transport of histidine and its role in atp production in trypanosoma cruzi., j. bioenerg. biomembr. ( ) – . https://doi.org/ . /s - - - . [ ] r.m.b.m. girard, m. crispim, m.b. alencar, a.m. silber, uptake of l-alanine and its distinct roles in the bioenergetics of trypanosoma cruzi, msphere. ( ). https://doi.org/ . /mspheredirect. - . [ ] f.s. damasceno, m.j. barisón, m. crispim, r.o.o. souza, l. marchese, a.m. silber, l- glutamine uptake is developmentally regulated and is involved in metacyclogenesis in trypanosoma cruzi, mol. biochem. parasitol. ( ). https://doi.org/ . /j.molbiopara. . . . [ ] e.p. camargo, growth and differentiation in trypanosoma cruzi. i. origin of metacyclic trypanosomes in liquid media, rev. inst. med. trop. sao paulo. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ (accessed january , ). [ ] f.s. damasceno, m.j. barison, m. crispim, r.o.o. souza, l. marchese, a.m. silber, l- glutamine uptake is developmentally regulated and is involved in metacyclogenesis in trypanosoma cruzi, mol biochem parasitol. ( ) – . https://doi.org/ . /j.molbiopara. . . . [ ] f.k. huynh, m.f. green, t.r. koves, m.d. hirschey, measurement of fatty acid oxidation rates in animal tissues and cell lines. methods enzymol. ( ) – . https://doi.org/ . /b - - - - . - . [ ] m.b. alencar, r.b.m.m. girard, a.m. silber, measurement of energy states of the trypanosomatid mitochondrion. methods mol. biol. ( ) – . https://doi.org/ . / - - - - _ . [ ] m.m. bradford, a rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. anal. biochem. ( ) – . https://doi.org/ . /abio. . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] p.l.l. bieber, t. abraham, t. helmrath, y. kim, r. dehlin, a rapid spectrophotometric assay for carnitine palmitoyltransferase, . https://ac.els-cdn.com/ / - s . - -main.pdf?_tid=d - - a - e e- a c &acdnat= _d bdaa a a b ccf a b ba d (accessed february , ). [ ] l.b. willis, w. saridah, w. omar, ; ravigadevi sambanthamurthi, a.j. sinskey, non- radioactive assay for acetyl-coa carboxylase activity, . http://palmoilis.mpob.gov.my/publications/jopr sp -laura.pdf (accessed february , ). [ ] g.e. racagni, e.e. machado de domenech, characterization of trypanosoma cruzi hexokinase, mol. biochem. parasitol. ( ) – . https://doi.org/ . / - ( ) - . [ ] m.f. rütti, s. richard, a. penno, a. von eckardstein, t. hornemann, an improved method to determine serine palmitoyltransferase activity, j. lipid res. ( ) – . https://doi.org/ . /jlr.d -jlr . [ ] a. magdaleno, i.y. ahn, l.s. paes, a.m. silber, actions of a proline analogue, l- thiazolidine- -carboxylic acid (t c), on trypanosoma cruzi, plos one. ( ) e . https://doi.org/ . /journal.pone. . [ ] f.s. damasceno, m.j. barison, e.m. pral, l.s. paes, a.m. silber, memantine, an antagonist of the nmda glutamate receptor, affects cell proliferation, differentiation and the intracellular cycle and induces apoptosis in trypanosoma cruzi, plos negl trop dis. ( ) e . https://doi.org/ . /journal.pntd. . [ ] k. figarella, m. rawer, n.l. uzcategui, b.k. kubata, k. lauber, f. madeo, s. wesselborg, m. duszenko, prostaglandin d induces programmed cell death in trypanosoma brucei bloodstream form, cell death differ. ( ) – . https://doi.org/ . /sj.cdd. . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] g.d. lopaschuk, s.r. wall, p.m. olley, n.j. davies, etomoxir, a carnitine palmitoyltransferase i inhibitor, protects hearts from fatty acid-induced ischemic injury independent of changes in long chain acylcarnitine. circ. res. ( ) – . https://doi.org/ . / .res. . . . . [ ] c.m. koeller, n. heise, the sphingolipid biosynthetic pathway is a potential target for chemotherapy against chagas disease, enzyme res. ( ) – . https://doi.org/ . / / . [ ] a.s. divakaruni, w.y. hsieh, l. minarrieta, t.n. duong, k.k.o. kim, b.r. desousa, a.y. andreyev, c.e. bowman, k. caradonna, b.p. dranka, d.a. ferrick, m. liesa, l. stiles, g.w. rogers, d. braas, t.p. ciaraldi, m.j. wolfgang, t. sparwasser, l. berod, s.j. bensinger, a.n. murphy, etomoxir inhibits macrophage polarization by disrupting coa homeostasis., cell metab. ( ) - .e . https://doi.org/ . /j.cmet. . . . [ ] m. duszenko, k. figarella, e.t. macleod, s.c. welburn, death of a trypanosome: a selfish altruism, trends parasitol. ( ) – . https://doi.org/ . /j.pt. . . . [ ] a.c. mariano, r. santos, m.s. gonzalez, d. feder, e.a. machado, b. pascarelli, k.c. gondim, j.r. meyer-fernandes, synthesis and mobilization of glycogen and trehalose in adult male rhodnius prolixus, arch. insect biochem. physiol. ( ) – . https://doi.org/ . /arch. . [ ] j.m.c. ribeiro, f.a. genta, m.h.f. sorgine, r. logullo, r.d. mesquita, g.o. paiva-silva, d. majerowicz, m. medeiros, l. koerich, w.r. terra, c. ferreira, a.c. pimentel, p.m. bisch, d.c. leite, m.m.p. diniz, j.l. da s.g. v. junior, m.l. da silva, r.n. araujo, a.c.p. gandara, s. brosson, d. salmon, s. bousbata, n. gonzález-caballero, a.m. silber, m. alves-bezerra, k.c. gondim, m.a.c. silva-neto, g.c. atella, h. araujo, f.a. dias, c. polycarpo, r.j. vionette-amaral, p. fampa, a.c.a. melo, a.s. tanaka, c. balczun, j.h.m. oliveira, r.l.s. gonçalves, c. lazoski, r. rivera-pomar, l. diambra, g.a. schaub, e.s. garcia, p. azambuja, g.r.c. braz, p.l. oliveira, an insight into the transcriptome of the .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / digestive tract of the bloodsucking bug, rhodnius prolixus, plos negl. trop. dis. ( ) e . https://doi.org/ . /journal.pntd. . [ ] l. antunes, j. han, j. pan, c.j.c. moreira, p. azambuja, metabolic signatures of triatomine vectors of trypanosoma cruzi unveiled by metabolomics, plos one. ( ) . https://doi.org/ . /journal.pone. . [ ] k.c. gondim, g.c. atella, e.g. pontes, d. majerowicz, lipid metabolism in insect disease vectors, insect biochem. mol. biol. ( ) – . https://doi.org/ . /j.ibmb. . . . [ ] p.r. bittencourt-cunha, l. silva-cardoso, g.a. de oliveira, j.r. da silva, a.b. da silveira, g.e.g. kluck, m. souza-lima, k.c. gondim, m. dansa-petretsky, c.p. silva, h. masuda, m.a.c. da silva neto, g.c. atella, p.r. bittencourt-cunha, l. silva-cardoso, g.a. de oliveira, j.r. da silva, a.b. da silveira, g.e.g. kluck, m. souza-lima, k.c. gondim, m. dansa-petretsky, c.p. silva, h. masuda, m.a.c. da silva neto, g.c. atella, perimicrovillar membrane assembly: the fate of phospholipids synthesised by the midgut of rhodnius prolixus, mem. inst. oswaldo cruz. ( ) – . https://doi.org/ . /s - . [ ] d.e. wood, e.l. schiller, trypanosoma cruzi: comparative fatty acid metabolism of the epimastigotes and trypomastigotes in vitro. exp. parasitol. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ . [ ] d.e. wood, trypanosoma cruzi: fatty acid metabolism in vitro, exp. parasitol. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ . [ ] j.j. cazzulo, aerobic fermentation of glucose by trypanosomatids, faseb j. ( ) – . https://doi.org/ . /fasebj. . . . . [ ] j.j. cazzulo, intermediate metabolism in trypanosoma cruzi, j. bioenerg. biomembr. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ (accessed june , ). [ ] b. frydman, c. santos, j.j.b. cannata, j.j. cazzulo, carbon- nuclear magnetic resonance .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / analysis of [ - c]-glucose metabolism in trypanosoma cruzi. evidence of the presence of two alanine pools and of two co fixation reactions, eur. j. biochem. ( ) – . https://doi.org/ . /j. - . .tb .x. [ ] a.e. leroux, d.a. maugeri, f.r. opperdoes, j.j. cazzulo, c. nowicki, comparative studies on the biochemical properties of the malic enzymes from trypanosoma cruzi and trypanosoma brucei, fems microbiol. lett. ( ) – . https://doi.org/ . /j. - . . .x. [ ] c. zelada, m. montemartini, j.j. cazzulo, c. nowicki, purification and partial structural and kinetic characterization of an alanine aminotransferase from epimastigotes of trypanosoma cruzi, mol. biochem. parasitol. ( ) – . https://doi.org/ . / - ( ) - . [ ] m. montemartini, j. buá, e. bontempi, c. zelada, a.m. ruiz, j. santomé, j. josé cazzulo, c. nowicki, a recombinant tyrosine aminotransferase from trypanosoma cruzi has both tyrosine aminotransferase and alanine aminotransferase activities, fems microbiol. lett. ( ) – . https://doi.org/ . /j. - . .tb .x. [ ] d. marciano, c. llorente, d.a. maugeri, c. de la fuente, f. opperdoes, j.j. cazzulo, c. nowicki, biochemical characterization of stage-specific isoforms of aspartate aminotransferases from trypanosoma cruzi and trypanosoma brucei, mol. biochem. parasitol. ( ) – . https://doi.org/ . /j.molbiopara. . . . [ ] j.j. cazzulo, s. arauzo, b.m. franke de cazzulo, j.j.b. cannata, on the production of glycerol and l-alanine during the aerobic fermentation of glucose by trypanosomatids, fems microbiol. lett. ( ) – . https://doi.org/ . /j. - . .tb .x. [ ] a. boveris, c.m. hertig, j.f. turrens, fumarate reductase and other mitochondrial activities in trypanosoma cruzi, mol. biochem. parasitol. ( ) – . https://doi.org/ . / - ( ) - . [ ] g.w. rogerson, w.e. gutteridge, catabolic metabolism in trypanosoma cruzi, int. j. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / parasitol. ( ) – . https://doi.org/ . / - ( ) - . [ ] m.g. pereira, g. visbal, t.f.r. costa, s. frases, w. de souza, g. atella, n. cunha-e-silva, trypanosoma cruzi epimastigotes store cholesteryl esters in lipid droplets after cholesterol endocytosis, mol. biochem. parasitol. ( ) – . https://doi.org/ . /j.molbiopara. . . . [ ] m.g. pereira, g. visbal, l.t. salgado, j.c. vidal, j.l.p. godinho, n.n.t. de cicco, g.c. atella, w. de souza, n. cunha-e-silva, trypanosoma cruzi epimastigotes are able to manage internal cholesterol levels under nutritional lipid stress conditions, plos one. ( ) e . https://doi.org/ . /journal.pone. . [ ] s. allmann, m. mazet, n. ziebart, g. bouyssou, l. fouillen, j.-w. dupuy, m. bonneu, p. moreau, f. bringaud, m. boshart, triacylglycerol storage in lipid droplets in procyclic trypanosoma brucei, plos one. ( ) e . https://doi.org/ . /journal.pone. . [ ] v.e. alvarez, g. kosec, c. sant’anna, v. turk, j.j. cazzulo, b. turk, autophagy is involved in nutritional stress response and differentiation in trypanosoma cruzi, j. biol. chem. ( ) – . https://doi.org/ . /jbc.m . [ ] v.t. contreras, j.m. salles, n. thomas, c.m. morel, s. goldenberg, in vitro differentiation of trypanosoma cruzi under chemically defined conditions, mol biochem parasitol. ( ) – . https://www.ncbi.nlm.nih.gov/pubmed/ . [ ] m.j. wainszelbaum, m.l. belaunzarán, e.m. lammel, m. florin-christensen, j. florin- christensen, e.l.d. isola, free fatty acids induce cell differentiation to infective forms in trypanosoma cruzi, biochem. j. ( ) – . https://doi.org/ . /bj . [ ] c.c.p. aires, l. ijlst, f. stet, c. prip-buus, i.t. de almeida, m. duran, r.j.a. wanders, m.f.b. silva, inhibition of hepatic carnitine palmitoyl-transferase i (cpt ia) by valproyl- coa as a possible mechanism of valproate-induced steatosis, biochem. pharmacol. ( ) – . https://doi.org/ . /j.bcp. . . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] p.f. kantor, a. lucien, r. kozak, g.d. lopaschuk, the antianginal drug trimetazidine shifts cardiac energy metabolism from fatty acid oxidation to glucose oxidation by inhibiting mitochondrial long-chain -ketoacyl coenzyme a thiolase, circ. res. ( ) – . https://doi.org/ . / .res. . . . [ ] c.d.l. folmes, a.s. clanachan, g.d. lopaschuk, fatty acid oxidation inhibitors in the management of chronic complications of atherosclerosis, curr. atheroscler. rep. ( ) – . https://doi.org/ . /s - - - . [ ] w.c. stanley, s.r. meadows, k.m. kivilo, b.a. roth, g.d. lopaschuk, β-hydroxybutyrate inhibits myocardial fatty acid oxidation in vivo independent of changes in malonyl-coa content, am. j. physiol. circ. physiol. ( ) h –h . https://doi.org/ . /ajpheart. . . [ ] r.s. o’connor, l. guo, s. ghassemi, n.w. snyder, a.j. worth, l. weng, y. kam, b. philipson, s. trefely, s. nunez-cruz, i.a. blair, c.h. june, m.c. milone, the cpt a inhibitor, etomoxir induces severe oxidative stress at commonly used concentrations, sci. rep. ( ) . https://doi.org/ . /s - - - . [ ] c.-h. yao, g.-y. liu, r. wang, s.h. moon, r.w. gross, g.j. patti, identifying off-target effects of etomoxir reveals that carnitine palmitoyltransferase i is essential for cancer cell proliferation independent of β-oxidation, plos biol. ( ) e . https://doi.org/ . /journal.pbio. . [ ] s.n. rampersad, multiple applications of alamar blue as an indicator of metabolic function and cellular health in cell viability bioassays, sensors (basel). ( ) – . https://doi.org/ . /s . [ ] j.j. homsy, b. granger, s.m. krassner, some factors inducing formation of metacyclic stages of trypanosoma cruzi, j. protozool. ( ) – . http://www.ncbi.nlm.nih.gov/pubmed/ (accessed january , ). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / supporting information s s fig. h-nmr analysis of excreted end products from glucose and threonine metabolism. the metabolic end products (succinate, acetate, alanine and lactate) excreted by the epimastigote cells that were incubated after h in pbs (a), pbs after h of starvation without (b) or with d-[u- c]- glucose (c) or palmitate (d) were determined by h-nmr. each spectrum corresponds to one representative experiment from a set of at least . a part of each spectrum ranging from . ppm to ppm is shown. the resonances were assigned as indicated: a , acetate; a , c-enriched acetate; al , alanine; al , c-enriched alanine; g , c-enriched glucose; l , lactate; l , c-enriched lactate; p , palmitate; s , succinate; and s , c-enriched succinate. s s fig. time course activities of enzymes measured in this work. a) (acc) acetyl-coa carboxylase, b) (cpt ) carnitine-palmitoyltransferase, and c) (spt) serine palmitoyltransferase. all the activities were measured in cell-free extracts of epimastigote forms at different moments of the growth curve as indicated in the main text. all the measurements were performed in triplicates. s to check if other well-known fao inhibitors have the same effect on the proliferation of t. cruzi epimastigotes, we performed the same assay as described in materials and methods by evaluating different concentrations of valproic acid (av) [ ], trimetazidine [ , ] and β-hydroxybutyrate [ ], which are inhibitors of -ketothiolase. because they did not affect the proliferation of the epimastigote forms, we used the higher concentration evaluated in these assays to know if the compounds inhibit fao by co trapping by using u- c-palmitate as a substrate. as observed, none of these compound inhibited the co production from palmitate, confirming that they are not inhibiting fao in t. cruzi. s fig. other fao inhibitors did not affect cell proliferation and fao in the epimastigote forms. the compounds were evaluated at concentrations between . and µm. for positive controls of dead cells, a combination of antimycin ( . µm) and rotenone ( µm) were used. the maximum concentration tested for these compounds does not diminish co liberation from fao. a) valproic acid (av). b) trimetazidine (tmz). c) β-hydroxybutyrate (βhob). s in this study, we showed that the epimastigote forms of t. cruzi present low sensitivity in response to eto treatment. recently, some groups described off-target effects when eto is used at concentrations of up to μm [ , ]. to validate eto as an fao inhibitor of t. cruzi, the parasites were incubated for h in pbs (negative control), . mm palmitate supplemented with bsa, . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / mm histidine, mm glucose, . mm carnitine and bsa without adding palmitate in the presence (or not) of μm eto. the viability of these cells was inferred from the measured total reductive activity using mtt assays (see material and methods section for more details). as expected, eto treatment did not affect the viability of cells incubated in glucose or histidine but did affect the viability of the cells incubated with palmitate or carnitine. surprisingly, we also observed an eto effect on parasites under metabolic stress, such as those incubated with pbs or bsa. this finding could be explained by the fact that under metabolic stress, the parasite mobilizes and consumes its internal lipids. s fig. eto did not affect the viability of epimastigote forms in the presence of other carbon sources. the viability of epimastigote forms after incubation with different carbon sources and palmitate. the viability was assessed after h using mtt. s because metacyclogenesis occurs in chemically defined conditions, we performed a viability assay to define the maximum tolerated concentration that allows the parasites to survive under eto treatment. stationary epimastigotes in tau- aag media were treated with different concentrations of eto (range to μm) during h. the viability of these cells was inferred by measuring the total reductive activity using an alamar blue assay [ ]. briefly, after h in the presence or absence of eto, the cells were incubated with , μg.ml- of alamar blue reagent in accordance with the protocol by [ ]. under these conditions, the parasites were times more sensitive to eto treatment, surviving when subjected to eto concentrations between - µm (fig. s a). this range of concentrations used to treat the parasites was maintained in tau- aag medium and to follow the differentiation by daily counts, based on the percentage of metacyclic trypomastigotes collected in culture supernatant. to confirm that the parasites were still alive after days under differentiation, we checked the viability of cells that were treated (or not, control) using the same assay. as shown above (figure s b), the parasites were viable under all the tested conditions. considering that tau- aag contains glucose in its composition, we performed an in vitro metacyclogenesis using only proline as a metabolic inducer [ ]. as observed, even in the absence of glucose, eto treatment affects metacyclogenesis. fig s . viability of epimastigote forms subjected to metacyclogenesis under different eto concentrations. a) cell viability under metacyclogenesis after h of treatment with different eto concentrations. b) cell viability under metacyclogenesis after days in the presence of eto. c) effect of eto on the metacyclogenesis induced by proline. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / reconstitution of cargo-induced lc lipidation in mammalian selective autophagy reconstitution of cargo-induced lc lipidation in mammalian selective autophagy chunmei chang , , xiaoshan shi , , liv e. jensen , , adam l. yokom , , dorotea fracchiolla , , sascha martens , and james h. hurley , , department of molecular and cell biology at california institute for quantitative biosciences, university of california, berkeley, berkeley, ca , usa department of biochemistry and cell biology, max perutz labs, university of vienna, vienna biocenter, dr. bohr-gasse , vienna, austria aligning science across parkinson’s collaborative research network, chevy chase, md, usa corresponding author: james h. hurley, orcid: - - - , e-mail: jimhurley@berkeley.edu .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract selective autophagy of damaged mitochondria, intracellular pathogens, protein aggregates, endoplasmic reticulum, and other large cargoes is essential for health. the presence of cargo initiates phagophore biogenesis, which entails the conjugation of atg /lc family proteins to membrane phosphatidylethanolamine. current models suggest that the presence of clustered ubiquitin chains on a cargo triggers a cascade of interactions from autophagic cargo receptors through the autophagy core complexes ulk and class iii pi -kinase complex i (pi kc -c ), wipi , and the atg , atg , and atg -atg -atg l machinery of lc lipidation. this model was tested using giant unilamellar vesicles (guvs), gst-ub as a model cargo, the cargo receptors ndp , tax bp , and optn, and the autophagy core complexes. all three cargo receptors potently stimulated lc lipidation on guvs. ndp - and tax bp -induced lc lipidation required the ulk complex together with all other components, however, ulk kinase activity was dispensable. in contrast, optn bypassed the ulk requirement completely. these data show that the cargo-dependent stimulation of lc lipidation is a common property of multiple autophagic cargo receptors, yet the details of core complex engagement vary considerably and unexpectedly between the different receptors. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction macroautophagy (hereafter autophagy) is an evolutionarily conserved catabolic pathway that sequesters intracellular components in double membrane vesicles, autophagosomes, and delivers them to lysosomes for degradation ( ). by removing excess or harmful materials like damaged mitochondria, protein aggregates, and invading pathogens, autophagy maintains cellular homeostasis and is cytoprotective ( ). autophagy is particularly important in maintaining the health of neurons, which are long-lived cells that have a high flux of membrane traffic. defective autophagy of mitochondria (mitophagy) downstream of mutations in pink and parkin is thought to contribute to the etiology of a subset of parkinson’s disease ( ). the de novo formation of autophagosome, central to autophagy, entails the formation of a membrane precursor, termed the phagophore (or isolation membrane) that expands and seals around cytosolic cargoes ( ). a set of autophagy related (atg) proteins drive autophagosome biogenesis. in mammalian cells, the unc- -like kinase (ulk ) complex, consisting of ulk itself, fip , atg , and atg , is typically recruited to the autophagosome formation site first. the class iii phosphatidylinositol -kinase complex i (pi kc -c ) is subsequently activated to generate phosphatidylinositol- -phosphate (pi( )p). pi( )p enriched membranes serve as platforms to recruit the downstream effector wipis (wd-repeat protein interacting with phosphoinositides), and atg /lc conjugation machinery ( , ). atg transfers phospholipids from endoplasmic reticulum (er) to the growing phagophore ( - ), while atg translocates phospholipids from the cytoplasmic to the luminal leaflet, enabling phagophore expansion ( , ). these above-mentioned proteins are sometimes referred to as the “core complexes” of autophagy. the attachment of the atg proteins of the lc and gabarap subfamilies to the membrane lipid phosphatidylethanolamine (pe), termed lc lipidation, is a hallmark of autophagosome biogenesis. lc lipidation occurs via a ubiquitin-like conjugation cascade. the ubiquitin e -like atg and the e -like atg carry out the cognate reactions in the lc pathway. the atg -atg -atg l complex scaffolds transfer of lc from atg to pe ( , ). the role of the atg -atg -atg l is analogous to that of a ring domain ubiquitin e ligase, although there is no sequence homology between any of the subunits and ubiquitin e ligases. covalent anchoring of lc to membrane is closely associated with phagophore membrane .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / expansion ( - ) and cargo sequestration ( - ). although recent evidence showed that autophagosome formation can still occur in mammalian cells lacking all six atg family proteins ( , ), their size was reduced and lysosomal fusion was impaired. lc lipidation is thus involved in multiple steps in autophagosome biogenesis, and is critical for promoting autophagosome- lysosome fusion ( , ). in most instances in mammalian cells, autophagy is highly selective and tightly regulated ( ). several targets of selective autophagy have been described, including mitochondria (mitophagy), intracellular pathogens (xenophagy), aggregated proteins (aggrephagy), endoplasmic reticulum (reticulophagy), lipid droplets (lipophagy), and peroxisomes (pexophagy) ( ). the achievement of selectivity relies on a family of autophagy receptors, which specifically bind to cargoes and the phagophore ( - ). some types of selective autophagy like aggrephagy, mitophagy, and xenophagy are initiated by the ubiquitination of cargoes, which are recognized by a subset of cargo receptors including p (sequestosome- ), nbr , optineurin (optn), ndp and tax -binding protein (tax bp ). all of these receptors contain a lc -interaction region (lir), a ubiquitin binding domain (ubd), and a dimerization/oligomerization domain ( , , ). these cargo receptors are well-known to connect cargo to the phagophore through their interaction with both clustered ubiquitin chains and membrane-conjugated lc . several cargo receptors have recently been shown to trigger autophagy initiation, thus functioning upstream of lc lipidation. ndp directly binds to and recruits the ulk complex to damaged mitochondria and intracellular bacteria by binding to the coiled-coil (cc) of the fip subunit of the ulk complex ( - ). p is also recruited to fip , but to its claw domain instead of the cc ( ). initiation of mitophagy by optn, however, appears to be independent of ulk ( ). these findings are beginning to reveal different roles for various cargo receptors in triggering early autophagy machinery assembly via distinct entry points. in vitro reconstitution studies have recently shown to recapitulate the steps in autophagosome formation, especially in yeast autophagy ( - ). these in vitro approaches are powerful to investigate the molecular mechanisms of such a complicated cell biological process by controlling the multi-component compositions and spatiotemporal arrangements. however, it has been challenging to reconstitute mammalian autophagosome because of the complexity of mammalian autophagy machinery. as part of a long-term effort, we recently reconstituted the events from pi( )p production by the pi kc -c to lc lipidation in mammalian autophagy in a .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / giant unilamellar vesicle (guv) model system ( ). separately, we showed that in the presence of clustered ubiquitin chains, ndp promotes recruitment of the ulk complex to membranes ( ). here, using the guv model system, and focusing on autophagy receptors involved in mitophagy, we established a start-to-finish reconstitution of selective autophagy initiation from autophagy receptor engagement through lc lipidation. we found that ndp , tax bp and optn triggered robust lc lipidation in the presence of the ulk complex, pi kc -c complex, wipi , and lc conjugation machinery. lc lipidation triggered by ndp and tax bp was dependent on both ulk and pi kc -c , while optn-induced lc lipidation was only dependent on the activity of pi kc -c . we further found that these cargo receptors trigger lc lipidation through distinct multivalent webs of interactions, thereby enabling the rapid lc lipidation for autophagosome formation. results reconstitution of ndp and tax bp -triggered lc lipidation we sought to establish a purified system that recapitulate the initiation of mitophagy, which is known to utilize the cargo receptors ndp , tax bp , and optn ( - ), together with the core autophagy initiation machinery and intracellular membranes. we used guvs with an er-like lipid composition to mimic the membranes, a mixture of linear tetraubiquitin and cargo receptors to mimic cargo signals, and these were incubated with a set of purified core autophagy machineries that are involved in autophagy initiation, including the ulk complex, the pi kc -c complex, the pi( )p effector wipi d, the e -related atg , the e -related atg , the functional ubiquitin e ligase counterpart atg -atg -atg l complex (hereafter referred to as “e ” or “e complex” for brevity), and lc b (fig. a and fig. s ). all proteins and complexes used were full- length and wild-type, with the exception of the fip d - construct (fig. s a), which was engineered to increase stability (fig. s b) and prevent non-specific aggregation. negative stain electron microscopy (nsem) images showed that fip d - had essentially the same structure as wild-type, while losing its propensity to aggregate (fig. s c) ( ). contour lengths and end-to- end distances of fip d - as analyzed by nsem were comparable as the full-length (fig. s d) ( ). all of the fluorescently tagged fusion constructs were previously characterized and shown to be functional ( , ). the typical concentrations of autophagy proteins in human cells are unknown, but as most are thought to be scarce, we used the following concentrations for all .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / reconstitution reactions: µm gst-ub , nm cargo receptors, nm ulk complex, nm pi kc -c complex, nm wipi d, nm e complex, nm atg , nm atg , and nm mcherry-lc b. we first investigated the recruitment of the e complex to membranes, since the localization of e complex dictates the sites of lc lipidation in cells ( ). we observed that in the presence of wipi d and the lc conjugation machinery, but not ulk complex or pi kc - c , little or no gfp-e or mcherry-lc b was recruited to the guvs within min (fig. b, first column). the addition of pi kc -c triggered the membrane recruitment of both e and lc b (fig. b, second column), consistent with the previous observation that pi kc -c activity is required for e membrane targeting and lc lipidation ( ). ulk phosphorylates core subunits of pi kc -c ( , ), but the addition of ulk complex together with pi kc -c had similar effects to pi kc -c alone (fig. b, third column). because ndp and the ulk complex have been shown to interact directly with the e complex ( - ), we asked whether ndp or ulk could mediate the membrane recruitment of e complex. however, the addition of ndp with gst-ub or that together with ulk complex did not result in an obvious increase of membrane enrichment of e and subsequent lc b (fig. b, fourth and fifth columns). however, ndp and gst-ub did enhance pi kc -c triggered e and lc b membrane recruitment (fig. b, sixth column). membrane recruitment was further enhanced when ulk complex was added (fig. b, last column). quantification of the kinetics of membrane binding showed that both e recruitment and lc lipidation were fastest when all the components were present (fig. c and d). the kinetics of e recruitment to membranes were faster than that of lc (fig. c and d), indicating that the e is being recruited in a catalytically competent form. we found that lc lipidation was slightly more efficient in the presence of the fip d - version of the ulk complex relative to the version containing wild-type fip (fig. s e). we thus used the fip d - ulk complex for all further reconstitution assays and refer to it hereafter as simply the “ulk complex”. taken together, these data show that ndp triggers efficient lc lipidation when both ulk and pi kc -c complexes are present. we next tested tax bp , a structural paralog of ndp which has roles in mitophagy, xenophagy, and aggrephagy ( , , ). we found that similar to ndp , tax bp induced the most robust and efficient e membrane binding and lc lipidation when both ulk and pi kc - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / c complex were included in the system (fig. e and f and fig. s ). although the addition of tax bp and gst-ub resulted in a slightly stronger e recruitment than ndp , no obvious increase in lc lipidation was observed. (fig. f and fig. s , fourth and fifth columns). together, these data indicate that, like ndp , tax bp can trigger robust lc lipidation in response to a cargo mimetic in vitro, which is dependent on both ulk and pi kc -c complex. reconstitution of optn triggered lc lipidation we went on to investigate another cargo receptor optn, which has been shown to mediate parkin- dependent mitophagy ( ). residues s and s of optn are phosphorylated by tank-binding kinase (tbk ), which were reported to enhance the binding of optn to both lc and ubiquitin ( ). we first tested the phosphomimetic double mutant of optn s d/s d, hereafter “optns d”. we observed that optns d and gst-ub alone induced a modest recruitment of both e and lc to the guv membrane, similar to the addition of pi kc -c (fig. a, first four columns). in addition, optns d and gst-ub dramatically increased e recruitment and lc lipidation by pi kc -c (fig. a, sixth column). however, in contrast to the situation with ndp or tax bp , the addition of ulk complex had no effect on either e or lc binding triggered by the optns d-ub-pi kc -c axis (fig. a, last column). the dynamics of lc lipidation and e binding when optns d, gst-ub and pi kc -c were present were essentially the same in the presence or absence of ulk complex (fig. b and c). these data indicate that optns d can also trigger a robust lc lipidation, but as distinct from ndp and tax bp , optn triggered lc lipidation depended only on the activity of pi kc -c . we compared the kinetics of e recruitment compared to that of lc in the presence of all components, lc was recruited slower than e with a mean lag of . min (fig. s ). we also evaluated the activity of wild type optn in the presence or absence of pi kc -c . optnwt and gst-ub also enhanced membrane binding of e and lc lipidation, but more weakly than optns d (fig. d and e), indicating that the higher affinities for lc and ubiquitin contributed to faster lc lipidation in the presence of optns d. the kinase activity of ulk is dispensable for cargo receptor induced-lc lipidation we next asked whether the kinase activity is required for the receptor induced lc lipidation, as the ulk kinase has been reported to phosphorylate multiple downstream autophagy components .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / upon autophagy initiation, including the atg and becn subunits of pi kc -c , atg l and atg a ( , , - ). we found that in the presence of ndp , gst-ub , pi kc -c , wipi d, and the lc conjugation machinery, the ulk kinase dead (kd) complex accelerated both e membrane recruitment and lc lipidation to the same extent as the wild-type complex (fig. a and b). in contrast, in the presence of optns d and all the other components, neither wild-type nor kd ulk complex enhanced e binding or lc lipidation (fig. s ). these data support that optn triggered lc lipidation is independent of both the catalytic and non-catalytic activities of the ulk complex. kinetics of ulk complex recruitment to membranes to analyze the differences between the three cargo receptors in more detail, we went on to investigate the kinetics of the recruitment of the upstream components as triggered by cargo receptors. we first monitored the kinetics of ulk complex recruitment. in the presence of wipi d and conjugation machinery, no detectable gfp tagged ulk complex was recruited to membrane (fig. a, first column). the addition of ndp or tax bp -ub, with gst-ub , dramatically enhanced the membrane binding of ulk complex (fig. a, second and third columns, b, and c), consistent with the previous observations that ndp directly recruited ulk complex in mitophagy or xenophagy ( - ). we noticed that only a little lc lipidation occurred even though the ulk complex was enriched on the membrane upon the addition of gst-ub with either ndp or tax bp (fig. a, second and third columns, b, and c). this suggested that ulk complex alone is in-sufficient to activate lc conjugation. in contrast to ndp or tax bp , the addition of optns d and gst-ub did not result in any increased ulk membrane binding (fig. a fourth column, and d). however, when pi kc -c was added in the reaction, we observed an obvious membrane binding of ulk complex, which was further enhanced by the addition of ndp , tax bp or even optns d (fig. a, last four columns, b to d). these data were interpreted in terms of a two-step recruitment of ulk complex to membrane in which the ulk complex is initially recruited to the membrane by ndp or tax bp , but not optn. once pi kc -c is active on the membrane, ulk complex recruitment is promoted further, even when pi kc -c is recruited downstream of optn only. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we then sought to understand the mechanism of pi kc -c dependent recruitment of the ulk as triggered by optns d. omission of pi kc -c or wipi d almost completely eliminated the membrane binding of the ulk complex. depletion of e complex also largely decreased ulk complex recruitment, however, depletion of atg or lc slightly increased the ulk membrane recruitment (fig. e and f). as expected, omission of any of the components downstream of ulk complex eliminated optns d triggered lc lipidation (fig. e and f). these data indicate that the pi kc -c -wipi d-e axis is required for the further recruitment of ulk , which is consistent with the previous observations that the translocation of ulk complex to omegasomes was stabilized by sustained pi( )p synthesis ( ) and that fip could form a trimeric complex with atg and wipi ( ). the lack of dependence on atg or lc rules out that an ulk lir motif-lc interaction ( , ) is driving ulk recruitment in these experiments. this recruitment of ulk complex downstream of optn does not lead to a feed forward increase in optn triggered lc lipidation, given that lc lipidation is similar in the presence or absence ulk complex (fig. b), which suggests that in optn-triggered mitophagy, the ulk complex functions at a stage of autophagosome formation subsequent to lc lipidation or in other processes that act in parallel to lc lipidation. kinetics of pi kc -c recruitment to membranes we next monitored the kinetics of pi kc -c complex recruitment during lc lipidation. as distinct from the ulk complex, the intrinsic membrane affinity of pi kc -c enabled it to bind membranes even in the presence of wipi d and conjugation machinery but not cargo receptors (fig. a, first column). this is consistent with the observation that pi kc -c alone can trigger lc lipidation in the absence of ulk complex ( ). the addition of optns d and gst-ub , but not gst-ub with ndp or tax bp , dramatically enhanced the membrane binding of pi kc -c complex (fig. a, first four columns, b to d). ulk complex alone did not increase pi kc -c membrane binding (fig. a, fifth column). however, the addition of ulk complex did promote membrane recruitment of pi kc -c in the presence of ndp or tax bp , although not optns d (fig. a, last three columns, b to d). these data indicate that optn strongly enhances membrane recruitment of pi kc -c on its own. ndp and tax bp have a similar ultimate effect, but only in the presence of ulk complex. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / given the dramatic increase of pi kc -c membrane binding triggered by optn, we sought to understand its mechanism. omission of wipi d almost completely blocked the membrane binding of pi kc -c (fig. e and f), consistent with the previous observation that pi kc -c and wipi d cooperatively bind to membranes ( ). in contrast, the depletion of atg , atg , or lc , but not e , resulted in a slight decrease of pi kc -c binding (fig. e and f), suggesting that a multivalent assembly of optn, pi kc -c , wipi d, and e may be responsible for the membrane recruitment of pi kc -c by optn. interactions between cargo receptors and the core autophagy machinery we found that ndp , tax bp and optn trigger robust lc lipidation by dramatically enhancing the membrane binding of ulk or pi kc -c complex, and that they do so by distinct mechanisms. we therefore hypothesized that these cargo receptors could interact with the autophagy core complexes in distinct ways. to test this, we systematically analyzed the binding between these cargo receptors and different autophagy components by a microscopy-based bead interaction assay (fig. a). the ulk complex was specifically recruited to beads coated with ndp and tax bp , but not optns d (fig. b and g), consistent with the observation that ndp or tax bp directly recruited ulk complex to membrane. however, no detectable pi kc -c complex was recruited to beads coated with optns d. instead, weak binding between pi kc -c and ndp or tax bp was detected (fig. c and g), suggesting that the increased membrane binding of pi kc -c by optn was not mediated by a direct interaction. weak binding between optns d and wipi d, ndp and wipi d, optns d and e , ndp and e were also observed (fig. d, e and g). we noticed a strong interaction between tax bp and wipi d or e (fig. e and g), which may explain the stronger membrane recruitment of e by tax bp and gst-ub alone. interactions between ndp , tax bp , optns d and lc b were weak at the tested concentrations (fig. f and g). these data indicate that cargo receptors directly bind to multiple autophagy components, and thus trigger lc lipidation through a multivalent web of both strong and weak interactions. discussion over the past two years, rapid advances in the mechanistic cell biology of autophagy, and the elucidation of new activities for autophagy proteins, has crystallized into detailed models for .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / mechanisms of autophagosome formation ( , ). the ability to biochemically reconstitute such a pathway from purified components is a stringent and powerful test of such models. moreover, reconstitution allows nuanced aspects of the interplay between components to be assessed with a rigor that is difficult in vivo. in the yeast model system, it was recently shown that a set of purified components could recapitulate the cargo-stimulated atg lipidation and lipid transfer into atg vesicles, confirming the function of atg vesicles as the seeds of the phagophore ( ). progress in the reconstitution of human autophagy is less advanced, despite the importance of selective autophagy in many human diseases. we previously reconstituted the pi kc -c , wipi , and e circuit, demonstrating positive feedback ( ). upstream of this circuit, we found that ndp mediated the cargo-initiated recruitment of the ulk complex to membranes in vitro ( ). here, we showed that it was possible to reconstitute the circuit connecting the major selective cargo receptors involved in mitophagy, ndp , tax bp and optn from cargo recognition to lc lipidation, with each situation manifesting unique properties (fig. ). one of the important recent conceptual advances in selective autophagy was the discovery that cargo receptors function upstream of the core autophagy initiating complexes ( , - ). this paradigm replaced the earlier model that cargo receptors connected substrates to pre-existing lc -lipidated membranes. here, we showed that cargo-engaged ndp , tax bp , or optn were capable of potently driving lc lipidation in the presence of physiologically plausible nanomolar concentration of the purified autophagy initiation complexes. these reconstitution data directly confirm the new model for cargo-induced formation of lc -lipidated membranes in human cells. we found that different cargo receptors use distinct mechanisms to trigger lc lipidation downstream of cargo. ndp is strongly dependent on the presence of the ulk complex, consistent with findings in xenophagy ( ) and mitophagy ( ). tax bp behaves much like ndp , as expected based on the common presence of an n-terminal skich domain, the locus of fip binding ( ). however, tax bp was more active in promoting lc lipidation in the absence of pi kc -c . unexpectedly strong binding was observed between tax bp and e and wipi d. this raises the possibility that tax bp -mediated selective autophagy may be less dependent on the core complexes as compared to ndp . in sharp contrast to ndp and tax bp , the in vitro lc lipidation downstream of optn is completely independent of the ulk complex. our finding is consistent with the recent .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / report that optn-induced mitophagy, in contrast to ndp , does not depend on the recruitment of fip ( ). it is also consistent with our observation of direct binding of ndp and tax bp , but not optn, to the ulk complex in vitro. we found that optn induced lc lipidation in vitro is strongly dependent on pi kc -c and wipi d, as expected based on the roles of these proteins in e activation. no single one of these complexes, or the e itself, bound strongly to optn, but all of them bound weakly. this suggests that a multiplicity of weak interactions with several factors contributes to the recruitment of the core complexes downstream of optn. atg a ( ), which was not present in this study, likely contributes further to this multivalent web of low affinity interactions. subunits of pi kc -c are phosphorylated by the ulk kinase ( , ), and it has long been assumed that these phosphorylation events would promote autophagy. we found, however, that the kinase dead version of the ulk complex was as effective in promoting lc lipidation as wild-type. this result is consistent with a recent pharmacological study that found ulk kinase activity to be dispensable for pi kc -c activation at p condensates ( ). in conclusion, we have reconstituted much of the process of cargo-stimulated selective autophagy using purified human proteins. the remaining steps still to be completed in vitro are the atg and atg -dependent transfer of phospholipids for phagophore growth, and the engulfment of cargo. the observations here provide powerful confirmation for the model that cargo itself triggers formation of lc -lipidated membranes on a just-in-time basis. they also reveal nuances of how different cargo receptors utilize distinct repertoires of weak and strong interactions with the core complexes to trigger lc lipidation. these are subtleties that would have been difficult to uncover in traditional cellular knock out and rescue experiments. these unique modes of core complex recruitment may underlie the divergent core complex phenotypes that are seen in different classes of selective autophagy in different cellular contexts. acknowledgements this work was supported by the aligning science across parkinson’s collaborative research network asap- (j.h.h. and s.m.), hfsp (rgp / to j.h.h. and s.m.), nih r gm (j.h.h.), and the jane coffin childs foundation (a. l. y.). conflict of interest .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / j.h.h. is a co-founder of casma therapeutics. s.m is member of the scientific advisory board of casma therapeutics. materials and methods plasmid construction synthetic codon-optimized dnas encoding components of human ulk complex, pi kc -c complex were subcloned into the pcag vector with gst, mbp or twinstrep-flag (tsf) tag. synthetic codon-optimized dnas encoding human atg -atg -atg were subcloned into the pgbdest vector with strep tag. dna encoding human wipi d was subcloned into the pcag vector with tsf tag. dna encoding mouse atg was subcloned into the pfast bacht vector with his tag. dnas encoding human atg , lc b were subcloned into the pet vector with his tag. dnas encoding human ndp , optn and tax bp were subcloned into the pgst vector with gst tag. dna encoding linear tetraubiquitin was subcloned into the pgex vector with gst tag. details are shown in table s . protein expression and purification the ulk complex, pi kc -c complex and wipi d protein were expressed and purified from hek gnti cells described as previously ( , ). dnas were transfected cells using polyethylenimine (polysciences). after - h expression, cells were harvested and lysed with lysis buffer ( mm hepes ph . , % triton x- , mm nacl, mm mgcl , % glycerol, and mm tcep) supplemented with edta free protease inhibitors (roche). the lysate was clarified by centrifugation ( rpm at °c for h) and incubated with resins. to purify gst-fip d - -mbp and gst-fip -mbp, the supernatant was incubated with glutathione sepharose b (ge healthcare) with gentle shaking at °c for h. the mixture was then loaded onto a gravity flow column, and the resin was washed extensively with wash buffer ( mm hepes ph . , mm nacl, mm mgcl and mm tcep). eluted protein samples flowed through amylose resin (new england biolabs) for a second step of affinity purification. the final buffer after mbp affinity purification is mm hepes ph . , mm nacl, mm mgcl , mm tcep and mm maltose. to purify (±gfp)-ulk complex for studying the effect of fip d - and ulk (kinase-dead mutant), fip /atg /atg .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / subcomplex and ulk were expressed and purified separately. after the first step of affinity purification, the two samples were mixed, cleaved by tev at °c overnight, and subjected to a second step of affinity purification using the mbp tag. the final buffer after mbp affinity purification is mm hepes ph . , mm nacl, mm mgcl , mm tcep and mm maltose. for the rest of guv experiments, fip /atg /atg subcomplex and ulk were expressed and purified separately in both first and second steps of affinity purification. the final buffer after second step mbp affinity purification is mm hepes ph . , mm nacl, mm mgcl , mm tcep and mm maltose. the complexes were used immediately for the guv assays. to purify (±gfp)-pi kc -c complex, the supernatant was incubated with glutathione sepharose b (ge healthcare) at °c for h, applied to a gravity column, and washed extensively with wash buffer ( mm hepes ph . , mm nacl, mm mgcl , and mm tcep). the protein complexes were eluted with wash buffer containing mm reduced glutathione, and then treated with tev protease at °c overnight. tev-treated complexes were loaded on a strep- tactin sepharose gravity flow column (iba, gmbh). the complexes were eluted with a final buffer containing mm hepes ph . , mm nacl, mm mgcl , mm tcep, and mm desthiobiotin (sigma), and then used immediately for the guv assays. to purify (±gfp)-wipi d protein, the supernatant was incubated with strep-tactin sepharose resin at °c for h, applied to a gravity column, and washed extensively with wash buffer ( mm hepes ph . , mm nacl, and mm tcep). the proteins were eluted with wash buffer containing mm desthiobiotin, applied onto a superdex column ( / prep grade, ge healthcare). the final buffer after gel filtration is mm hepes ph . , mm nacl, and mm tcep. fractions containing pure (±gfp)-wipi d protein were pooled, concentrated, snap frozen in liquid nitrogen and stored at - °c. the atg -atg -atg complex and atg protein were expressed and purified from sf cells as previously described ( ). sf cells were infected with a single virus stock p corresponding to the poli-cystronic construct coding atg -atg -atg complex or atg . cells were harvested h after infection, lysed and clarified following the same procedure for mammalian cells described as above. to purify (±gfp)-atg -atg -atg complex, the supernatant was incubated with strep-tactin sepharose at °c for h, applied to a gravity column, and washed extensively with wash buffer ( mm hepes ph . , mm nacl, and mm tcep). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the proteins were eluted with wash buffer containing mm desthiobiotin, applied onto a superdex column ( / increase). the final buffer after gel filtration is mm hepes ph . , mm nacl, and mm tcep. peak fractions containing pure (±gfp)-atg -atg - atg complexes were pooled and snap frozen in liquid nitrogen and stored at - °c. to purify atg protein, the supernatant was loaded on a ni-nta column (ge healthcare) gravity flow column, washed extensively with wash buffer ( mm hepes ph . , mm nacl, mm imidazole and mm tcep). the proteins were eluted with wash buffer containing mm imidazole, applied onto a superdex column ( / prep grade). the final buffer after gel filtration is mm hepes ph . , mm nacl, and mm tcep. peak fractions containing pure atg protein were pooled and snap frozen in liquid nitrogen and stored at - °c. the linear tetraubiquitin, ndp , optn, tax bp , atg and mcherry-lc b were expressed and purified from e. coli (bl de ). protein expression was induced with μm iptg when cells were grown to an od of . and further grown at °c overnight. cells were harvested and stocked in - °c if needed. to purify gst tagged linear tetraubiquitin and receptors, the pellets were resuspended in a buffer containing mm hepes ph . , mm nacl, mm tcep and protease inhibitors (roche), and sonicated before being cleared at rpm at °c for h. the supernatant was incubated with glutathione sepharose b at °c for h, applied to a gravity column, and washed extensively with wash buffer ( mm hepes ph . , mm nacl, and mm tcep). the proteins were eluted with wash buffer containing mm reduced glutathione, and then applied onto a superdex column ( / increase). the final buffer after gel filtration is mm hepes ph . , mm nacl, and mm tcep. peak fractions containing pure proteins were pooled and snap frozen in liquid nitrogen and stored at - °c. to purify atg and mcherry-lc b, the pellets were resuspended in a buffer containing mm hepes ph . , mm nacl, mm tcep, mm imidazole and protease inhibitors, sonicated and clarified. the supernatant was loaded on a ni-nta column (ge healthcare) gravity flow column, washed extensively with wash buffer ( mm hepes ph . , mm nacl, mm imidazole and mm tcep). the proteins were eluted with wash buffer containing mm imidazole, applied onto a superdex column ( / prep grade). the final buffer after gel filtration is mm hepes ph . , mm nacl, and mm tcep. fractions containing pure proteins were pooled, concentrated, snap frozen in liquid nitrogen and stored at - °c. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / preparation of giant unilamellar vesicles (guvs) guvs were prepared by hydrogel-assisted swelling as described previously ( ). briefly, μl % (w/w) polyvinyl alcohol (pva) with a molecular weight of , (millipore) was coated onto a plasma-cleaned coverslip of mm diameter. the coated coverslip was placed in a heating incubator at °c to dry the pva film for min. for all the guv experiments, a lipid mixture with a molar composition of . % dopc, % dope, % popi, % dops and . % atto n dope at mg/ml was spread uniformly onto the pva film. the lipid-coated coverslip was then put under vacuum overnight to evaporate the solvent. μl mosm sucrose solution was used for swelling for h at room temperature, and the vesicles were then harvested and used within h. atto n dope (atto tec) was used as the guv membrane dye. all the other lipids for guvs preparation are from avanti polar lipids. in vitro reconstitution guv assay the reactions were set up in an eight-well observation chamber (lab tek) at room temperature. the chamber was coated with mg/ml β casein for min and washed three times with reaction buffer ( mm hepes at ph . , mm nacl and mm tcep). a final concentration of µm gst- xub, nm cargo receptors, nm ulk complex, nm pi kc -c complex, nm wipi d, nm atg -atg -atg complex, nm atg , nm atg , nm mcherry-lc b, µm atp, and mm mncl was used for all reactions unless otherwise specified. µl guvs were added to initiate the reaction in a final volume of µl. after min incubation, during which random views were picked for imaging, time-lapse images were acquired in multitracking mode on a nikon a confocal microscope with a × plan apochromat . na objective. three biological replicates were performed for each experimental condition. identical laser power and gain settings were used during the course of all conditions. microscopy-based bead protein-protein interaction assay a mixture of . µm gst or gst tagged cargo receptors and different atg proteins was incubated with µl glutathione sepharose beads (ge healthcare) in a reaction buffer containing mm hepes at ph . , mm nacl and mm tcep. the final concentration of different atg proteins was as following: nm gfp-ulk complex, nm gfp-pi kc -c complex, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / nm gfp-wipi d, nm atg -atg -atg -gfp complex, and nm mcherry-lc b. after incubation at room temperature for min, the beads were washed three times, suspended in µl reaction buffer, and then transferred to the observation chamber for imaging. images were acquired on a nikon a confocal microscope with a × plan apochromat . na objective. three biological replicates were performed for each experimental condition. negative stain electron microscopy preparation, collection and coiled-coil tracing protein sample of gst- fip d - -mbp was diluted to nm concentration in elution buffer. µl of sample was applied to continuous carbon grids which were glow discharge in a pelco easiglow instrument for s at mamps. protein was wicked away using torn whatman paper and immediately stained with % uranyl acetate. wicking was repeated again for a second round of % uranyl acetate staining. data was collected at kv on a tecnai t microscope with a nominal magnification of x. micrographs were taken with a gatan ccd k x k camera at a pixel size of . Å/pixel. protein particles were manually selected using the manual picking tool within relion . and extracted at a binned box size of by corresponding to a pixel size of . Å/pixel. extracted particles were measured for coli-coli length in fiji as previously described ( ). in brief, single particles were traced using the simple neurite tracer plug in for fiji. histogram of the data was prepared for both path length of the coli-coli ( nm) and the end to end distance of the coli-coli ( nm). image quantification guv images were analyzed using a custom script implemented in python . (https://github.com/hurley-lab/guvquantification/blob/main/guvintensity- channel.ipynb). briefly, to obtain the outline of all the vesicles within a field of view, images were segmented into regions corresponding to local maxima of the membrane fluorescence channel, which were defined by applying an otsu threshold to the differences between local maxima and minima. then, binding of the fluorescently labelled proteins was quantified by taking the mean value of these segmented pixels in the fluorescent protein channel. background was calculated as the average of the vesicle- internal background and the vesicle-external background and subtracted from the fluorescence signal. the intensity trajectories of multiple fields of view were then obtained frame by frame. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / multiple intensity trajectories were calculated, and the averages and standard deviations were calculated and reported. for quantification of protein intensity binding to bead, the outline of individual bead was manually defined based on the bright field channel. the intensity threshold was calculated by the average intensities of pixels inside and outside of the bead and then intensity measurements of individual bead were obtained. averages and standard deviations were calculated among the measured values per each condition and plotted in a bar graph. statistical analysis statistical analysis was performed by unpaired student’s t test using graphpad prism . p < . was considered statistically significant. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . n. mizushima, m. komatsu, autophagy: renovation of cells and tissues. cell , - ( ). . h. morishita, n. mizushima, diverse cellular roles of autophagy. annu rev cell dev biol , - ( ). . a. m. pickrell, r. j. youle, the roles of pink , parkin, and mitochondrial fidelity in parkinson's disease. neuron , - ( ). . t. j. melia, a. h. lystad, a. simonsen, autophagosome biogenesis: from membrane growth to closure. j cell biol , ( ). . j. h. hurley, l. n. young, mechanisms of autophagy initiation. annu rev biochem , - ( ). . n. mizushima, t. yoshimori, y. ohsumi, the role of atg proteins in autophagosome formation. annu rev cell dev biol , - ( ). . s. maeda, c. otomo, t. otomo, the autophagic membrane tether atg a transfers lipids between membranes. elife , ( ). . t. osawa et al., atg mediates direct lipid transfer between membranes for autophagosome formation. nat struct mol biol , - ( ). . d. p. valverde et al., atg transports lipids to promote autophagosome biogenesis. j cell biol , - ( ). . s. maeda et al., structure, lipid scrambling activity and role in autophagosome formation of atg a. nat struct mol biol , - ( ). . k. matoba et al., atg is a lipid scramblase that mediates autophagosomal membrane expansion. nat struct mol biol , - ( ). . t. hanada et al., the atg -atg conjugate has a novel e -like activity for protein lipidation in autophagy. j biol chem , - ( ). . y. ichimura et al., a ubiquitin-like system mediates protein lipidation. nature , - ( ). . n. fujita et al., an atg b mutant hampers the lipidation of lc paralogues and causes defects in autophagosome closure. mol biol cell , - ( ). . h. nakatogawa, y. ichimura, y. ohsumi, atg , a ubiquitin-like protein required for autophagosome formation, mediates membrane tethering and hemifusion. cell , - ( ). . y. s. sou et al., the atg conjugation system is indispensable for proper development of autophagic isolation membranes in mice. mol biol cell , - ( ). . k. tsuboyama et al., the atg conjugation systems are important for degradation of the inner autophagosomal membrane. science , - ( ). . a. b. birgisdottir, t. lamark, t. johansen, the lir motif - crucial for selective autophagy. j cell sci , - ( ). . v. rogov, v. dotsch, t. johansen, v. kirkin, interactions between autophagy receptors and ubiquitin-like proteins form the molecular basis for selective autophagy. mol cell , - ( ). . j. sawa-makarska et al., cargo binding to atg unmasks additional atg binding sites to mediate membrane-cargo apposition during selective autophagy. nat cell biol , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . t. n. nguyen et al., atg family lc /gabarap proteins are crucial for autophagosome-lysosome fusion but not autophagosome formation during pink /parkin mitophagy and starvation. j cell biol , - ( ). . f. reggiori, m. komatsu, k. finley, a. simonsen, autophagy: more than a nonselective pathway. int j cell biol , ( ). . d. gatica, v. lahiri, d. j. klionsky, cargo recognition and degradation by selective autophagy. nat cell biol , - ( ). . g. zaffagnini, s. martens, mechanisms of selective autophagy. j mol biol , - ( ). . a. stolz, a. ernst, i. dikic, cargo recognition and trafficking in selective autophagy. nat cell biol , - ( ). . v. kirkin, v. v. rogov, a diversity of selective autophagy receptors determines the specificity of the autophagy pathway. mol cell , - ( ). . v. kirkin, d. g. mcewan, i. novak, i. dikic, a role for ubiquitin in selective autophagy. mol cell , - ( ). . b. j. ravenhill et al., the cargo receptor ndp initiates selective autophagy by recruiting the ulk complex to cytosol-invading bacteria. mol cell , - e ( ). . x. shi, c. chang, a. l. yokom, l. e. jensen, j. h. hurley, the autophagy adaptor ndp and the fip coiled-coil allosterically activate ulk complex membrane recruitment. elife , ( ). . j. n. s. vargas et al., spatiotemporal control of ulk activation by ndp and tbk during selective autophagy. mol cell , - e ( ). . e. turco et al., fip claw domain binding to p promotes autophagosome formation at ubiquitin condensates. mol cell , - e ( ). . k. yamano et al., critical role of mitochondrial ubiquitination and the optn-atg a axis in mitophagy. j cell biol , ( ). . l. w. brier, m. zhang, l. ge, mechanistically dissecting autophagy: insights from in vitro reconstitution. journal of molecular biology, ( ). . y. fujioka et al., phase separation organizes the site of autophagosome formation. nature , - ( ). . j. m. alam, n. n. noda, in vitro reconstitution of autophagic processes. biochem soc trans , - ( ). . j. sawa-makarska et al., reconstitution of autophagosome nucleation defines atg vesicles as seeds for membrane formation. science , ( ). . d. fracchiolla, c. chang, j. h. hurley, s. martens, a pi k-wipi positive feedback loop allosterically activates lc lipidation in autophagy. j cell biol , ( ). . y. c. wong, e. l. holzbaur, optineurin is an autophagy receptor for damaged mitochondria in parkin-mediated mitophagy that is disrupted by an als-linked mutation. proc natl acad sci u s a , e - ( ). . m. lazarou et al., the ubiquitin kinase pink recruits autophagy receptors to induce mitophagy. nature , - ( ). . j. m. heo, a. ordureau, j. a. paulo, j. rinehart, j. w. harper, the pink -parkin mitochondrial ubiquitylation pathway drives a program of optn/ndp recruitment and tbk activation to promote mitophagy. mol cell , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . a. s. moore, e. l. holzbaur, dynamic recruitment and activation of als-associated tbk with its target optineurin are required for efficient mitophagy. proc natl acad sci u s a , e - ( ). . j. m. heo et al., integrated proteogenetic analysis reveals the landscape of a mitochondrial-autophagosome synapse during park -dependent mitophagy. sci adv , eaay ( ). . c. s. evans, e. l. f. holzbaur, lysosomal degradation of depolarized mitochondria is rate-limiting in optn-dependent neuronal mitophagy. autophagy , - ( ). . n. fujita et al., the atg l complex specifies the site of lc lipidation for membrane biogenesis in autophagy. mol biol cell , - ( ). . r. c. russell et al., ulk induces autophagy by phosphorylating beclin- and activating vps lipid kinase. nat cell biol , - ( ). . j. m. park et al., the ulk complex mediates mtorc signaling to the autophagy initiation machinery via binding and phosphorylating atg . autophagy , - ( ). . n. gammoh, o. florey, m. overholtzer, x. jiang, interaction between fip and atg l distinguishes ulk complex-dependent and -independent autophagy. nat struct mol biol , - ( ). . d. fracchiolla et al., mechanism of cargo-directed atg conjugation during selective autophagy. elife , ( ). . t. nishimura et al., fip regulates targeting of atg l to the isolation membrane. embo rep , - ( ). . s. a. sarraf et al., loss of tax bp -directed autophagy results in protein aggregate accumulation in the brain. mol cell , - e ( ). . d. a. tumbarello et al., the autophagy receptor tax bp and the molecular motor myosin vi are required for clearance of salmonella typhimurium by autophagy. plos pathog , e ( ). . b. richter et al., phosphorylation of optn by tbk enhances its binding to ub chains and promotes selective autophagy of damaged mitochondria. proc natl acad sci u s a , - ( ). . r. m. alsaadi et al., ulk -mediated phosphorylation of atg l promotes xenophagy, but destabilizes the atg l crohn's mutant. embo rep , e ( ). . c. zhou et al., regulation of matg trafficking by src- and ulk -mediated phosphorylation in basal and starvation-induced autophagy. cell res , - ( ). . d. f. egan et al., small molecule inhibition of the autophagy kinase ulk and identification of ulk substrates. mol cell , - ( ). . e. karanasios et al., dynamic association of the ulk complex with omegasomes during autophagy induction. j cell sci , - ( ). . h. c. dooley et al., wipi links lc conjugation with pi p, autophagosome formation, and pathogen clearance by recruiting atg - - l . mol cell , - ( ). . c. kraft et al., binding of the atg /ulk kinase to the ubiquitin-like protein atg regulates autophagy. embo j , - ( ). . e. a. alemu et al., atg family proteins act as scaffolds for assembly of the ulk complex: sequence requirements for lc -interacting region (lir) motifs. j biol chem , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . h. nakatogawa, mechanisms governing autophagosome biogenesis. nat rev mol cell biol , - ( ). . m. zachari, m. longo, i. g. ganley, aberrant autophagosome formation occurs upon small molecule inhibition of ulk kinase activity. life sci alliance , ( ). . s. baskaran et al., architecture and dynamics of the autophagic phosphatidylinositol - kinase complex. elife , ( ). . g. stjepanovic, s. baskaran, m. g. lin, j. h. hurley, unveiling the role of vps kinase domain dynamics in regulation of the autophagic pi k complex. mol cell oncol , e ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figures .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. reconstitution of ndp and tax bp -triggered lc lipidation (a) the schematic drawing illustrates the reaction setting. the blue curve indicates the guv membrane. gray cartoons are autophagy components present in the reaction. (b) representative confocal images showing the membrane recruitment of e complex (green) and lc b (red). guvs were incubated with wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn +, and different upstream components as listed above each image column. images taken at min and min are shown. scale bars, µm. (c and d) quantitation of the kinetics of mcherry-lc b (c) and e -gfp (d) recruitment to the membrane from individual guv tracing in a (averages of vesicles are shown, error bars indicate standard deviations). (e and f) guvs were incubated with wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn +, and different proteins listed above the images in fig.s . quantitation of the kinetics of mcherry-lc b (e) and e -gfp (f) recruitment to the membrane from individual guv tracing (averages of vesicles are shown, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. reconstitution of optn triggered lc lipidation (a) representative confocal images showing the membrane recruitment of e complex and lc b. guvs were incubated with wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn +, and different protein as listed above each image column. images taken at min and min are shown. scale bars, µm. (b and c) quantitation of the kinetics of mcherry-lc b (b) and e -gfp (c) recruitment to the membrane from individual guv tracing in a (averages of vesicles are shown, error bars indicate standard deviations). (d and e) guvs were incubated with wipi d, e -gfp, atg , atg , mcherry-lc b, gst-ub , atp/mn +, and optnwt or optns d in the presence or absence of pi kc -c . quantitation of the kinetics of mcherry-lc b (d) and e -gfp (e) recruitment to the membrane from individual guv tracing are shown (averages of vesicles, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. the kinase activity of ulk is dispensable for cargo receptor induced-lc lipidation (a) representative confocal images showing the membrane recruitment of e -gfp complex and mcherry-lc b. guvs were incubated with gst-ub , ndp , pi kc -c , wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn + in the presence or absence of ulk wt complex or ulk kinase dead complex. images taken at min are shown. scale bars, µm. (b) quantitation of the kinetics of e -gfp complex and mcherry-lc b recruitment to the membrane from individual guv tracing in a are shown (averages of vesicles, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. kinetics of ulk complex recruitment to membranes (a) representative confocal images showing the membrane recruitment of gfp-ulk complex and mcherry-lc b. guvs were incubated with wipi d, e complex, atg , atg , mcherry- lc b, atp/mn + and different protein combinations as listed above each image column. images .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / taken at min are shown. scale bars, µm. (b-d) guvs were incubated with wipi d, e complex, atg , atg , mcherry-lc b, atp/mn +, and gfp-ulk complex or gfp-ulk complex together with pi kc -c complex, in the presence or absence of ndp and gst-ub (b), or tax bp and gst-ub (c), or optns d and gst-ub (d). quantitation of the kinetics of gfp-ulk complex and mcherry-lc b recruitment to the membrane from individual guv tracing are shown (averages of vesicles, error bars indicate standard deviations). (e) representative confocal images showing the membrane recruitment of gfp-ulk complex and mcherry-lc b. guvs were incubated with gst-ub , optns d, gfp-ulk complex, pi kc -c complex, wipi d, e complex, atg , atg , mcherry-lc b, and atp/mn + each time omitting one of the components downstream of ulk complex. scale bars, µm. (f) quantitation of the kinetics of gfp-ulk complex and mcherry-lc b recruitment to the membrane from individual guv tracing in e are shown (averages of vesicles, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. kinetics of pi kc -c recruitment to membranes (a) representative confocal images showing the membrane recruitment of gfp-pi kc -c complex and mcherry-lc b. guvs were incubated with wipi d, e complex, atg , atg , .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / mcherry-lc b, atp/mn + and different protein combinations as listed above each image column. images taken at min are shown. scale bars, µm. (b-d) guvs were incubated with wipi d, e complex, atg , atg , mcherry-lc b, and gfp-pi kc -c complex or gfp-pi kc -c complex together with ulk complex, in the presence or absence of optns d and gst-ub (b), or ndp and gst-ub (c), or tax bp and gst-ub (d). quantitation of the kinetics of gfp-pi kc -c complex and mcherry-lc b recruitment to the membrane from individual guv tracing are shown (averages of vesicles, error bars indicate standard deviations). (e) representative confocal images showing the membrane recruitment of gfp-pi kc -c complex and mcherry-lc b. guvs were incubated with gst-ub , optns d, gfp-pi kc -c complex, wipi d, e complex, atg , atg , mcherry-lc b, and atp/mn + each time omitting one of the components downstream of pi kc -c complex. scale bars, µm. (f) quantitation of the kinetics of gfp-pi kc -c complex and mcherry-lc b recruitment to the membrane from individual guv tracing in e are shown (averages of vesicles, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. interactions between cargo receptors and the core autophagy machinery (a) the schematic drawing illustrates the bead based pull-down setting. (b-f) representative confocal images showing recruitment of gfp-ulk complex (b), gfp-pi kc -c complex (c), e -gfp (d), gfp-wipi d (e) or mcherry-lc b (f) to beads coated with gst, gst-ndp , gst-tax bp or gst-optns d. a mixture of gst or gst tagged cargo receptors with different fluorescent protein tagged autophagy components were incubated with gsh beads for .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / h and images were taken and shown. scale bars, µm. (g) the quantification of gfp or mcherry signal on beads are shown (averages of beads, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. model for cargo receptor mediated lc lipidation for selective autophagy that degrades targets relying on ubiquitination signals, the cargo receptors like ndp , tax bp , or optn first bind to ubiquitinated cargos, and recruit distinct multiple autophagy machineries through a multivalent web of weak interactions, these components work together to trigger membrane association of lc family proteins. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary materials supplementary figure legends fig. s purification of core autophagy machinery all purified autophagy components were resolved on a % sds page and shown by coomassie brilliant blue stain. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. s characterization of ulk complex with fip d - (a) the schematic drawing shows the domain structure of fip . (b) purified fip full-length or fip d - was resolved on a % sds page and shown by coomassie brilliant blue stain. (c) negative stain em single particles of fip d - . (d) histogram of fip d - path length and end-to-end distances. (e) guvs were incubated with gst-ub , ndp , pi kc -c , wipi d, e , atg , atg , mcherry-lc b, atp/mn + in the presence of ulk wt complex or ulk complex with fip d - . quantitation of the kinetics of mcherry-lc b recruitment to the membrane from individual guv tracing are shown (averages of vesicles, error bars indicate standard deviations). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig.s reconstitution of tax bp -triggered lc lipidation representative confocal images showing the membrane recruitment of e -gfp and mcherry- lc b. guvs were incubated with wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn +, and different upstream components as listed above each image column, respectively. images taken at min and min are shown. scale bars, µm. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. s the kinetics of e and lc membrane recruitment quantitation of the kinetics of e -gfp and mcherry-lc b recruitment to the membrane from individual guv tracing are shown (averages of vesicles, error bars indicate standard deviations). the guvs were incubated with gst-ub , ulk complex, pi kc -c , wipi d, e , atg , atg , mcherry-lc b, atp/mn + in the presence of ndp , tax bp or optns d. the data were fitted into the boltzmann sigmoidal curve by graphpad prism , and t / was calculated. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. s the kinase activity of ulk is dispensable for optn induced-lc lipidation (a) representative confocal images showing the membrane recruitment of e -gfp complex and mcherry-lc b. guvs were incubated with gst-ub , optns d, pi kc -c , wipi d, e -gfp, atg , atg , mcherry-lc b, atp/mn + in the presence or absence of ulk wt complex or ulk kinase dead complex. images taken at min are shown. scale bars, µm. (b) quantitation of the kinetics of e -gfp complex and mcherry-lc b recruitment to the membrane from individual guv tracing in c are shown (averages of vesicles, error bars indicate standard deviations). all results representative of three independent experiments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table s construct vector expression system published gst-tevcs-fip -mbp pcag hekgnti ( ) gst-tevcs- fip d - -mbp pcag hekgnti this study atg pcag hekgnti ( ) gfp-atg pcag hekgnti ( ) gst- tevcs-atg pcag hekgnti ( ) gst-tevcs-gfp-atg pcag hekgnti ( ) mbp-tsf-tevcs-ulk pcag hekgnti ( ) mbp-tsf-tevcs-ulk k i pcag hekgnti this study gst-tevcs-atg pcag hekgnti ( ) gst-tevcs-gfp-atg pcag hekgnti ( ) tsf-tevcs-vps pcag hekgnti ( ) tsf-tevcs-becn pcag hekgnti ( ) vps pcag hekgnti ( ) wipi d-tevcs-tsf pcag hekgnti ( ) gfp-wipi d-tevcs-tsf pcag hekgnti this study atg - xhis-tevcs-atg - xhis- tevcs-atg l -tevcs-strepii- atg -atg pgbdest sf ( ) atg - xhis-tevcs-atg - xhis- tevcs-atg l -gfp-tevcs-strepii- atg -atg pgbdest sf ( ) xhis-tevcs-atg pfast bacht(b) sf ( ) xhis-tevcs-atg pet duet- e. coli ( ) xhis-tevcs-mcherry-lc b- gly(∆ c) pet duet- e. coli ( ) gst-ub pgex e. coli ( ) gst-ndp pgst e. coli ( ) gst-tax bp pgst e. coli this study gst-optn pgst e. coli this study gst-optns ds d pgst e. coli this study .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / intramolecular quality control: hiv- envelope gp signal-peptide cleavage as a functional folding checkpoint nicholas mccaul , , , matthias quandte , , , ilja bontjer , guus van zadelhoff , aafke land , , rogier w. sanders , , ineke braakman * these authors contributed equally cellular protein chemistry, bijvoet center for biomolecular research, science life, faculty of science, utrecht university, padualaan , ch, utrecht, the netherlands department of medical microbiology, laboratory of experimental virology, center for infection and immunity amsterdam (cinima), academic medical center, meibergdreef , az, amsterdam, the netherlands department of microbiology and immunology, weill medical college of cornell university, new york, ny, usa present address: program in cellular and molecular medicine, boston children’s hospital, boston, ma, usa. present address: dr heinekamp benelux b.v., leidse rijn , pz, de meern the netherlands present address hogeschool utrecht, institute of life sciences, fc dondersstraat , je, utrecht, the netherlands *lead contact: i.braakman@uu.nl .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / summary removal of the membrane-tethering signal peptides that target secretory proteins to the endoplasmic reticulum is a prerequisite for proper folding. while generally thought to be removed well before translation termination, we here report two novel post-targeting functions for the hiv- gp signal peptide, which remains attached until gp folding triggers its removal. first, the signal peptide improves fidelity of folding by enhancing conformational plasticity of gp by driving disulfide isomerization through a redox- active cysteine, at the same time delaying folding by tethering the n-terminus to the membrane, which needs assembly with the c-terminus. second, its carefully timed cleavage represents intramolecular quality control and ensures release and stabilization of (only) natively folded gp . postponed cleavage and the redox-active cysteine both are highly conserved and important for viral fitness. considering the ~ % secretory proteins in our genome and the frequency of n-to-c contacts in protein structures, these regulatory roles of the signal peptide are bound to be more common in secretory-protein biosynthesis. keywords: endoplasmic reticulum, gp , disulfide bond, redox-active cysteine, protein folding, signal peptide, membrane tethering introduction the endoplasmic reticulum (er) is home to a wealth of resident chaperones and folding enzymes that cater to approximately a third of all mammalian proteins during their biosynthesis (ellgaard et al., ; kanapin et al., ). it is the site of n-linked glycan .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / addition and disulfide-bond formation, both of which contribute to protein folding, solubility, stability, and function. targeting to the mammalian er in general is mediated by n-terminal signal peptides, which direct the ribosome-nascent chain complex to the membrane and initiate co-translational translocation (blobel and dobberstein, ; gorlich et al., ; görlich et al., ; jackson and blobel, ; lingappa et al., ; walter, ). for soluble and type-i transmembrane proteins, the n-terminal signal peptide is - amino acids long and contains a cleavage site recognized by the signal peptidase complex (von heijne, ). while a great deal of sequence variation occurs between signal sequences, conserved features do exist. these include a positively charged, n-terminal n-region, a hydrophobic h-region and an er-lumenal c- region (von heijne, , , ). classic paradigm-establishing studies showed that cleavable signal peptides are removed co-translationally, immediately upon exposure of the cleavage site in the er lumen (blobel and dobberstein, ; jackson and blobel, ). this would imply that signal peptides function only as cellular postal codes and that signal-peptide cleavage and folding are independent events. evidence is emerging however that increased nascent-chain lengths are required for cleavage (daniels et al., ; hegde and bernstein, ; rutkowski et al., ), indicating that the signal peptidase does not cleave each consensus site immediately upon translocation into the er lumen. examples are the influenza-virus hemagglutinin, in which signal-peptide cleavage occurs on the longer nascent chain, well after glycosylation (daniels et al., ), edem (tamura et al., ), human cytomegalovirus (hcmv) protein us (rehm et .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / al., ), and hiv- envelope glycoprotein gp (li et al., , ), suggesting that signal peptides can function as more than mere postal codes. late signal-peptide cleavage is easily overlooked because western blots lack temporal resolution and may mask small mass differences. gp is the sole antigenic protein on the surface of the hiv- virion and mediates hiv- entry into target cells (wyatt and sodroski, ). it folds and trimerizes in the er, leaves upon release by chaperones and packaging into copii-coated vesicles, and is cleaved by golgi furin proteases into two non-covalently associated subunits: the soluble subunit gp (figure a, in colors), which binds host-cell receptors, and the transmembrane subunit gp (figure a, uncolored), which contains the fusion peptide (decroly et al., ; earl et al., ; earl et al., ; hallenberger et al., ; wyatt and sodroski, ). the so-called outer-domain residues [according to (pancera et al., )] are colored in pink (figure a), the inner domain, which folds from more peripheral parts of the gp sequence, in grey, and the variable loops in green. correct function of gp requires proper folding including oxidation of the correct cysteine pairs into disulfide bonds (bontjer et al., ; land and braakman, ; land et al., ; sanders et al., ; snapp et al., ). disulfide-bond formation and isomerization in gp begin co-translationally, on the ribosome-attached nascent chain, and continue long after translation, until the correct set of ten conserved disulfide bonds have been formed (land and braakman, ; land et al., ). the soluble subunit gp can be expressed independently of gp and folds with highly similar kinetics as gp (land et al., ). signal-peptide cleavage only occurs once gp .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / attains a near-native conformation and requires both n-glycosylation and disulfide-bond formation, but is gp independent (land et al., ; li et al., ). mutation-induced co-translational signal-peptide cleavage changes the folding pathway of gp and is disadvantageous for viral function (pfeiffer et al., ; snapp et al., ). given the interplay between signal-peptide cleavage and gp folding, we set out to investigate the mechanism that drives post-translational cleavage and its relevance for gp folding and viral fitness. we used various kinetic oxidative-folding assays on gp combined with functional studies on recombinant hiv strains encoding gp mutants, and discovered a novel role for the er-targeting signal peptide as quality-control checkpoint and folding mediator. a conserved cysteine in the membrane-tethered signal peptide drives disulfide isomerization in the gp ectodomain until gp folding triggers signal- peptide cleavage and release of the n-terminus. we uncovered this functional, mutual regulation as an intramolecular quality control that ensures native folding of a multidomain glycoprotein. results signal-peptide cleavage requires the gp c-terminus of the nine disulfide bonds in gp , five are critical for proper folding and signal- peptide cleavage, three in the constant regions of gp in the (grey) inner domain, and two in the outer (pink) domain at the base of variable loops (green) v and v [figure a, (van anken et al., )]. gp undergoes extensive disulfide isomerization during .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / its folding process as seen from the smear of gp folding intermediates (it) in the non-reducing gel upon s-radiolabeling, from reduced gp down to beyond native gp [(land et al., ); figure b, nr, ' chase]. this smear gradually disappears into a native band with discrete mobility, at around the time the signal peptide is removed (figure b r, from ' chase). yet, it is far from obvious which aspect of gp folding triggers signal-peptide cleavage. we therefore embarked on the linear approach and prepared c-terminal truncations of gp from -aa length ( x, a gp molecule truncated after position ) to full-length and analyzed in which the signal peptide was cleaved (figure s ). radioactive pulse-chase experiments showed that only full-length gp ( residues long) and x lost their signal peptides, but that in all shorter forms, including x, the signal peptide remained uncleaved and hence attached to the protein (figure s ). we continued with a time course for x and x to examine and compare their folding pathways (figure b). both truncations encompass the entire gp sequence except for the last and amino acids, respectively (figure a). like wild-type gp , immediately after pulse labeling (synthesis) the c-terminally truncated mutants ran close to the position of reduced gp in non-reducing sds- page. the x truncation formed disulfide bonds towards a compact structure, as the folding intermediates it ran lower in the gel than reduced protein. it failed to form native gp however (nt, figure b) or another stable intermediate. instead it acquired compactness far beyond the mobility of nt and remained highly heterogeneous, suggesting the formation of long-distance disulfide bonds that increased compactness .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and hence electrophoretic mobility (figure b, cells nr, '- h). only a fraction, if any, of the x mutant lost its signal peptide or acquired competence to leave the er and be secreted (figure b, medium), even though all cysteines were present in the x gp protein. in contrast, addition of only residues made x behave like wild-type gp : cleavage rate and secretion were indistinguishable (figure b). oxidative folding progressed similarly as well, except for a transient non-native disulfide-linked population that ran more compact than nt (figure b, cells nr, - h) and disappeared over time (cells nr, - h). we concluded that signal-peptide cleavage required synthesis and folding of more than out of the amino acids of gp . the downstream amino acids in the c region in the inner (grey) domain (figure a, teal) triggered the switch from non-cleavable to cleavable. a pseudo salt bridge in the inner-domain -sandwich controls signal-peptide cleavage and gp function amino acids - form a -strand (figures a and b, teal,  ), which is part of the -sandwich in the inner (grey) domain of gp (figures a, a, and b) [coding of strands and helices from (garces et al., )]. this -sandwich is formed by interactions of seven -strands in constant domains c , c , and c (figures a, a, and b). six of the strands are close to the n-terminus and the th strand,  (teal), is contributed by the c-terminal c region. as the addition of  triggered signal-peptide removal we hypothesized that the complete and properly folded -sandwich was the minimal requirement for cleavage. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / to address this, we designed charge mutants aimed to prevent assembly of  with the n-terminal part of the -sandwich (figure b). in the high-resolution crystal structure (garces et al., ) k (in  ) forms hydrogen bonds with e (in  ), e (in  ), and the main-chain oxygen of n (in  ) (figure b). we created charge-reversed mutants of the n-terminal glutamates (e k and e k), the c-terminal lysine (k e) and combinations thereof (figures c and s ). we did not include n in our mutagenesis study since its interaction involves the main-chain oxygen, which cannot be removed; we considered this inappropriate for our question. as gp is the dominant subunit in gp folding and signal-peptide cleavage, and allows more detailed analysis because it is smaller than gp , we subjected wild-type gp and all mutants to pulse-chase analysis of their oxidative folding, signal-peptide cleavage, and secretion (figures c-e and s a-d). like the c-terminal truncations ( x, figure b), all charge mutants in the -sandwich formed gp molecules with higher electrophoretic mobility than native gp (nt), implying appreciable non-native long-range disulfide bonding, persisting at all time points or disappearing into aggregates (figure s b). k e showed the strongest phenotype: minimal formation of nt and a much-delayed signal-peptide cleavage (figures c, e and s b, cells nr and r, band rc). this folding step was crucial for function as k e mutant virus was non- infectious (figure f). a striking rescue of the strong folding defect of k e was effected by the charge reversal at the n-terminus: the double mutant e k k e displayed improved gp oxidation (cells nr), signal-peptide cleavage (cells r), and secretion (medium) (figures c-e and s c), and rescued infectivity to ~ % of wild .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / type (figure f). all n-terminal e-to-k mutants (e k, e k) oxidized and accumulated in a native-like position in gel but failed to form much of the sharp nt band seen in wild type (figure c and s a). the start of signal-peptide cleavage of both single mutants was delayed to - min after synthesis and total cleavage was - % lower than wild type by h (cells r and figure e and s c). as a result, the secretion of all n- terminal mutants was decreased by ≥ % in h compared to wild type (medium, figures c, d and s a, c). the redundancy of two negative charges likely contributes to the intermediate folding phenotype of the n-terminal mutants and the ~ % residual viral infectivity of the e k mutant (figure f). we concluded that the c-terminal -strand was essential for proper folding of the - sandwich in the inner domain, which completes folding of gp and triggers signal- peptide cleavage. timing of cleavage hence represents a checkpoint for proper folding of gp . retention of the signal peptide causes hypercompacting of gp during folding, gp undergoes extensive disulfide formation and isomerization before reaching its native state. these intermediates appear as “waves” on sds-page representing varying degrees of compactness of folding intermediates (land et al., ). because mutants of gp that exhibited delayed or absent cleavage all formed hypercompact forms that ran below nt (figures b and c), we asked whether this heterogeneous electrophoretic mobility represented continued isomerization of gp . we substituted the alanine in the - -position relative to the cleavage site for a valine, to .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / prevent cleavage by the signal-peptidase complex (figure g, cells r). initial oxidative folding of a v was similar to that of wild type (figure g, nr). the a v mutant, however, did not form a native band but populated all forms, from reduced to hypercompact oxidized, probably isomerizing continuously. the monomeric forms gradually disappeared into disulfide-linked, sds-insoluble aggregates that increased in size and eventually became too large to enter the gel (figure g, agg). in both wild-type and a v gp , an endoglycosidase h-resistant band appeared over time (figure g, ehr). for wild-type gp this represents molecules that have transited through the golgi complex and acquired an n-acetylglucosamine residue on their n-glycans but have yet to be secreted. for a v gp this population may be due to the inaccessibility of some sugars for removal due to formation of sds-insoluble aggregates. we concluded that retention of the signal peptide either promotes formation of these hypercompact forms or prevents recovery from them. because all signal-peptide- retaining mutants showed a high propensity of aggregation, it is likely that these sds- insoluble aggregates are comprised of hypercompact forms of gp . tethering the n- terminus appears beneficial for folding, but release of gp from both tether and isomerization-driving cysteine is vital for stabilization of the acquired native fold and release from the er. the cysteine in the signal peptide interacts with cysteines in gp .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / as the signal peptide stays attached to gp for at least min after chain termination it influences both co- and post-translational folding, through the tethering of the n- terminus to the membrane, as well as through interactions with the mature gp sequence (snapp et al., ). the hypercompacting in non-cleaved mutants by continued disulfide isomerization (figure g) implies that an unpaired cysteine must be available to keep attacking formed disulfide bonds. opening once-formed disulfides may improve folding yield as the folding protein regains conformational freedom, and at the same time has a chance to recover from non-native disulfide bonding. existing disulfides may be attacked by a cysteine from an oxidoreductase in the er, or by an intramolecular cysteine in gp [as shown for bpti (weissman and kim, , )]. the unpaired cysteine in position within the signal peptide is a likely interaction candidate, because it is part of the consensus sequence for the signal peptidase and as such (partially) exposed to the er lumen. during translocation, c may interact with gp cysteines while they pass through the translocon. folding analysis as in figures and however showed that mutating c had no detectable effect on oxidative folding (figure a, c a): folding intermediates disappeared, folded nt appeared, and the signal peptide was cleaved similarly and at similar times as wild-type gp . either c a was identical to wild type or differences are missed due to asynchrony of the folding gp population. to amplify mobility differences, we alkylated with iodoacetic acid, which adds a charge to each free cysteine it binds to. to better synchronize the folding cohort, we modified the pulse-chase protocol with a preincubation with puromycin to release unlabeled nascent chains before labeling and added cycloheximide in the chase media to block elongation of radiolabeled nascent chains. at .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / each chase time, gp c a ran higher on a reducing gel (figure s ), indicating that it has more free cysteines as it bound more iodoacetic acid than wild-type gp . this could be due to either slower disulfide-bond formation, faster disulfide-bond reduction, or a combination of both, which suggests a role for c in the net gp disulfide formation or isomerization during folding. the importance of c became clear when we removed disulfide bond - in c . deletion of disulfide - prevents signal-peptide cleavage, but allows gp to reach a compact position just above natively folded protein nt [figure b, (van anken et al., )]. when c a was introduced into the - deletion, folding intermediates were blocked at a much earlier phase and remained significantly less compact (figure b). the phenotype was the same when we combined c a with the individual deletions of c or c (figure s ). c a not only prevented formation of compact folding intermediates, it also increased their heterogeneity. as c deletion aggravated folding defects of - disulfide bond mutants, c must have partially compensated for the - folding defect by participating in oxidative folding. we concluded that c a in the signal peptide was important for oxidative folding of incompletely folded gp , most likely for sustaining isomerization of non-native disulfide bonds, and is partially redundant with the - cysteines for this process. to analyze whether c , in addition to redundancy, interacted directly with the - disulfide bond we used a amino-acid truncation ( x, figure c) for simplicity as it retained the signal peptide and contains a single native cysteine pair. because .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / formation of this disulfide bond was not detectable by comparing reduced and non- reduced samples, we made use of an alkylation-switch assay [figure d, (appenzeller- herzog and ellgaard, )]. in short, we radiolabeled cells expressing x and blocked free cysteines with nem. cells then were homogenized and denatured in % sds and incubated again with nem to block any free cysteines previously shielded by structure. after immunoprecipitation and reduction of disulfide bonds with tcep, we alkylated resulting free cysteines with mpeg-malemide , , which adds ~ kda of mass for each cysteine alkylated. samples were immunoprecipitated again to remove mpeg and were analyzed by non-reducing - % sds-page (figure e). the x construct only showed weak disulfide-bond formation with only ~ % of molecules forming a disulfide bond (figure e, wt). upon removal of the signal-peptide cysteine c however, the population that contained a disulfide bond increased significantly to ~ % (figure f). the presence of c thus further destabilized the already unstable - disulfide bond. the non-native disulfide bond - in the c a mutant barely formed, whereas the - disulfide bond in the c a mutant was highly variable (figure f). this suggests that disulfide bonds involving the signal-peptide cysteine are unstable and may only occur transiently, a feature consistent with a transient role in disulfide isomerization. the n-terminal cysteines form long-range disulfides during early gp folding gp undergoes constant disulfide isomerization during folding (land et al., ) and prolonged association of the signal peptide appears to intrinsically sustain isomerization .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and may further destabilize the already unstable - disulfide bond. moreover, we have shown redundancy of c with disulfide bond - , which is why we asked whether the three n-terminal cysteines in gp were taking part in long-range disulfide bonds during folding. we removed the v v variable loops, which are not essential for folding and function (bontjer et al., ), and inserted a cleavage site for the protease thrombin through mutagenesis (l r). this removed all disulfide bonds in v v , - and - by the loop deletion and - by mutation (c - a). we named the resulting construct gp th. reduction after cleavage produces an n- terminal fragment of ~ kda containing the signal peptide and the n-terminal cysteines c , c , and c , plus a ~ kda fragment containing the rest of gp (figure a). if long-range disulfides between the n and c-terminal fragments indeed exist, the cleaved, non-reduced molecule should run in the same position as uncleaved in non-reducing conditions, and should dissociate into the fragments under reducing conditions. radioactive pulse-chase experiments as described above were modified: instead of deglycosylation with endoh we denatured the protein with . % sds and cleaved gp with . u thrombin. the fragments were separated by - % discontinuous sds-page. as expected, gp that lacked all n-terminal cysteines did not form any long-distance disulfide bonds (figure b and c, c a c - a). we confirmed that wild-type gp contained long-distance disulfide bonds between n- terminal cysteines and the rest of the molecule during early folding (figure b and c, wt). removal of c significantly reduced the number of molecules with a long-range .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / disulfide bond (figure d), likely due to increased stability of the - disulfide bond in the absence of c . strikingly, all cysteine mutants that retained a single cysteine yielded some long-distance disulfides, suggesting that all three n-terminal cysteines could form a non-native pair with downstream cysteines in gp (figure b-d). removal of v /v disulfides causes more rapid gp folding perhaps counterintuitively, the thrombin-cleavage construct (gp th) folded faster than full-length gp (figure a). directly after the pulse, gp th ran as a more diffuse band whereas full-length gp (gp wt) remained close to the reduced- gp mobility (figure a nr). this increased compactness shows that gp th had already formed more or larger-loop-forming disulfide bonds (snapp et al., ). as a result, signal-peptide cleavage of gp th was faster: almost complete for gp th after a -hour chase, compared to ~ % cleaved of gp wt (figure a, r). the - disulfide-bond mutants in gp -wt background fold to a stable intermediate just above the native position [figures b and s , (van anken et al., )] whereas the same mutants lacking v v (in gp th) failed to accumulate in a single band (figure b), reminiscent of the folding of gp c a c - a (figure b). this indicates that v v deletion phenocopied c deletion in the - mutants. redundancy of v v with c was confirmed by the lack of additional effect of c removal in the gp th - mutants (figure b). c a results in decreased hiv- production and pseudovirus infectivity .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / as the biochemical data suggested a role for c in gp folding, we examined the effect of c a mutation on viral production and infectivity. for this we transfected cells with a molecular clone containing the full hiv genome (lai strain), containing either wild-type gp or the c a mutant. as the reading frames of env and hiv- vpu overlap and mutations in the signal peptide of gp can cause changes in the c- terminus of vpu, we produced the viruses in hek t cells, which are deficient in cd and tetherin and therefore do not require vpu to enhance virus production (van damme et al., ). we consistently detected significantly less c a hiv virus than wild-type hiv produced (figure a). strikingly, the c a virus was significantly more infectious than wild-type hiv but displayed strong heterogeneity in infectivities, indicative of heterogeneity in c a gp incorporated into the virions (figure b). due to the severe deficit in virus production, despite increased infectivity, c a-gp -containing hiv is not likely to be competitive in nature. indeed, alignments of > , gp sequences from across all subtypes show that c is ~ % conserved (www.hiv.lanl.gov). to uncouple differences in virus production from infectivity we moved to a pseudovirus system, which allows analysis of the effect of c a gp on infectivity alone (figure c and d). as expected, we found very little difference in virus production between wild- type and c a gp (figure c). infectivity of the c a gp pseudovirus was roughly % less than the infectivity of wild type (figure d). we concluded therefore that gp conformation, its function, and as a result hiv, suffered from the removal of the signal-peptide cysteine. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / discussion during and for - min after synthesis, the n-terminus of hiv- gp remains tethered to the er membrane by its transient signal anchor. we here show that conformational plasticity is enhanced through the cysteine in the signal peptide driving disulfide isomerization, in part via the - disulfide bond, until the gp c-terminus has assembled with the n-terminus, completing the inner-domain β-sandwich and gp folding (figure ). this triggers signal-peptide cleavage, removing c from the protein, halting further isomerization and stabilizing the native gp form. this intramolecular quality-control process is essential for viral fitness of hiv and can be impaired and restored by single charge reversals in the gp inner domain. hierarchy of gp folding the inner domain with the β-sandwich and the outer domain, which contains a stacked double β-barrel, together constitute the minimal folding-capable “core” of gp [(figure a), (garces et al., ; kwong et al., )]. gp is completed with the surrounding variable loops v v , v , and v (green in figures a, a and b). the core contains six of the nine disulfide bonds in gp , including the five that are essential for correct folding and signal-peptide cleavage (van anken et al., ). the inner-domain β-sandwich consists of seven strands, six of which are n-terminal. we here show that proper folding of the sandwich requires assembly with the c-terminal β strand and formation of the five essential disulfide bonds, which then leads to signal-peptide cleavage [figure a, (van anken et al., )]. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / some folding of the gp 'hairpin' begins during translation and translocation into the er (land et al., ), but the bulk occurs post-translationally (figure ). the (pink) outer domain is the first complete domain to emerge from the translocon and has low contact order, meaning that folding does not require integration of distal residues (figure ). it is the first domain to fold, which requires formation of the two native disulfide bonds ( - and - ) in the -barrel: gp lacking either disulfide bond barely folds past the reduced position in sds-page (sanders et al., ; van anken et al., ). the (grey) inner domain of gp folds next: individual deletions of its three essential disulfide bonds ( - , - , and - ) fold into more compact structures than the outer-domain deletions (van anken et al., ). the most intriguing is the - disulfide: the c - a mutant accumulates in a sharp band just above the native position, retaining its signal peptide. in contrast, c - a and c - a fail to form defined intermediates (van anken et al., ), suggesting that these -sandwich- embracing disulfides (figure a) stabilize the inner domain. folding of the inner domain leads to cleavage of the signal peptide. until that time, the signal peptide acts as signal anchor because it adopts an α-helical conformation that extends past the cleavage site and prevents proteolytic cleavage (snapp et al., ). folding and integration of the inner domain must break the helix and allow cleavage to occur, as in crystal structures of gp this early helical region is a β-strand (garces et .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / al., ). as the disulfide bonds of v /v and gp are dispensable for signal-peptide cleavage and their deletion shows no aberration in the folding pathway (van anken et al., ), those domains likely fold after or independent of the outer and inner domains (figure ). this is underscored by gp folding and signal-peptide cleavage being largely independent of gp (land et al., ), and by n- and c-terminal sequences in the gp inner domain forming the binding site of gp (garces et al., ; julien et al., ; lyumkis et al., ). gp binding may explain apparent inconsistencies between folding and function of some inner-domain mutants (garces et al., ; yang et al., ). the conserved gp binding site on the gp inner-domain -barrel also may explain the conservation (and hence value) of the intramolecular quality-control system: it ensures proper folding of this binding site, with high fidelity and well timed, before gp folding. the regulation of signal-peptide cleavage by folding of gp implies that the formation of the -sandwich generates sufficient force to break the attached α-helix and expose the cleavage site. alpha-helical proteins have lower mechanical strength than β-sheet proteins, which often need to resist dissociation and unfolding; lower mechanical strength facilitates conformational changes to expose transient binding sites or allow signaling (chen et al., ). only ~ pn indeed suffices for exposure of a protease- cleavage site in an α-helix: for proteolytic activation of notch, cleavage of the nrr domain by adam (gordon et al., ), and of the talin r domain, a -helix bundle (del rio et al., ; yao et al., ); the von willebrand factor a domain requires ~ pn (zhang et al., ). pulling apart a -sheet protein such as ig domains, ospa, or .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ubiquitin by shearing needs > pn (brockwell et al., ; carrion-vazquez et al., ; hertadi et al., ), with less force required for peeling (brockwell et al., ). physiological forces measured so far do not exceed ~ pn, a force level at which proteins in general may be destabilized already (chen et al., ). the α-helical region around the cleavage site in gp thus would lose the stability competition from the -sheet in the inner domain, if their structures are incompatible; indeed, in the gp structure this α-helical region is a -strand (figures a and ). first-time folding, i.e. the completion of the inner-domain -sandwich by assembly of β , is likely to generate sufficient force as well, as - pn allows constant binding of a filamin -strand to a -sheet (rognoni et al., ). formation of the inner-domain disulfide bonds may further raise the stabilizing force (eckels et al., ). the completion of gp therefore likely generates the ~ -pn force needed to break the α-helix and allow the signal peptidase to cleave off the gp signal peptide. effects of the attached signal peptide the postponed cleavage of the signal peptide makes it a transient signal anchor, which acts as membrane tether. this limits conformational freedom and benefits folding, i.e. the integration of the c-terminal  -strand into the folded inner-domain -sandwich, as in knotted proteins (soler and faisca, ). the prolonged proximity of the free signal-peptide cysteine ( ) supports disulfide isomerization and increases conformational plasticity during gp folding. gp requires a native set of disulfide bonds to attain its functional d structure, but already .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / during synthesis non-native disulfides are formed, which reshuffle over time (land et al., ). the isomerization is detected as “waves” in sds-page, where the heterogeneous population of folding intermediates oscillates between higher and lower compactness over time [(figure a),(land et al., )]. despite extensive isomerization during folding, wild-type gp only transiently occupied forms more compact than native. in contrast, the various β-sandwich and uncleavable mutants extensively populated hypercompact states with non-native long-range disulfide bonds, indicating that without stable assembly of the n- and c-termini and resulting retention of the signal peptide, isomerization continues unabated and drives the formation of these hypercompact structures. the constant disulfide isomerization is sustained by the redox-active cysteine in the signal peptide, as its sulfhydryl group is free to attack existing disulfide bonds. once gp folding has reached a state where isomerization is no longer preferred (n- and c- termini in the inner domain assembled), the signal peptide is cleaved, removing an important, conserved driving force behind isomerization. cleavage of the signal peptide then acts as a sink because it removes the disulfide-attacking cysteine and pulls the folding equilibrium to the native structure. mode of action of c in a short construct, c favored a disulfide bond with c (figure f). despite a limited ability to form disulfide bonds with cysteines downstream of the - bond (figure d), deletion of c did not aggravate the c - a defective phenotype in the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / absence of v v . c hence likely sustains isomerization primarily by constant attack and destabilization of the - disulfide bond. this propagates a free sulfhydryl group downstream through the folding protein, as the result of an intramolecular electron- transport chain from the more c-terminal cysteines via c and c to c (figure ). in the -residue gp chain ( x), essentially a mimic of a released gp nascent chain, in maximally % of molecules the - bond had formed, demonstrating its inherent instability as well as the likelihood that c already acts on and during translation. only in the presence of v v , c showed redundancy with the - disulfide, implying that c can fulfill roles otherwise played by c and c (and vice versa). this suggests that c is involved in folding (and isomerization of v v ) and may play this role by direct interaction with v v cysteines, in absence of c and c , distinct from its - -mediated role in downstream disulfide bonds formation. we cannot exclude that the attack of c on the v v disulfides leads to an alternative electron transport chain from c-terminal cysteines via v v to c . either way, the redox-active c needs to be removed at the end of gp folding to ensure stability of the gp conformation. intramolecular oxidoreductase and quality control for proper folding built-in isomerase activity may seem redundant considering that gp folds in the er, a compartment that contains > protein-disulfide-isomerase family members (jansen et al., ). these oxidoreductases are large, bulky proteins however, which cannot .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / catalyze disulfide-bond formation once areas have attained significant tertiary structure. isomerase activity built into the folding protein allows free-thiol propagation (in essence electron transport) in areas otherwise unreachable by folding enzymes. this perhaps should not be too surprising given that in folded proteins the majority of cysteines are solvent inaccessible (srinivasan et al., ) and during folding disulfide bonds become resistant to reduction with dtt (tatu et al., ; tatu et al., ). an example of intramolecular disulfide isomerization is the cysteine in the pro-peptide of bovine pancreatic trypsin inhibitor (bpti), which increased both the rate and yield of bpti folding (weissman and kim, ). the majority of disulfide formation during in- vitro folding of bpti results from intramolecular disulfide rearrangements (creighton et al., ; darby et al., ; weissman and kim, ). transfer of free thiols between lumenal and transmembrane domains in the er has been demonstrated for vitamin-k- epoxide reductase (liu et al., ; schulman et al., ), indicating that such exchanges are possible. while c is located in the transmembrane α-helix (snapp et al., ), suggesting immersion in the membrane, sliding of transmembrane domains up and down in the membrane is possible (borochov and shinitzky, ; danielson et al., ; mowbray and koshland, ). as c is part of the consensus sequence for signal-peptide cleavage, it likely is exposed to the er lumen at least part of the time. not only intramolecular oxidoreductase activity offers an advantage, also intramolecular quality control. release of a protein from the er requires its folding to the extent that chaperones do not bind anymore, for instance due to shielding of hydrophobic residues .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / from hsp chaperones. the intramolecular quality control we here describe ensures a much more subtle regulation of conformational quality. gp 's function as hiv- fusion protein requires the native interaction between gp and the gp inner domain. only proper exposure of the gp binding site in gp will lead to a functional protein. intramolecular quality control ensures precision to the level of single residues as well as precision of timing. conserved and multiple roles for signal peptides post-translational signal-peptide cleavage of gp is conserved across different subtypes of hiv- as biochemical properties, even if sequences are not strictly conserved (snapp et al., ). this appears to be more general, as in other organisms signal peptides mutate at a lower rate than the surrounding mature peptide (morrison et al., ; williams et al., ), or they mutate at the same rate, but with an increased proportion of null (veitia and caburet, ) or function-preserving mutants (garcia- maroto et al., ). function-altering mutants often have deleterious effects (bonfanti et al., ; piersma et al., ). detailed kinetic analysis of signal-peptide cleavage has not been reported for a great number of proteins, and western blotting often does not offer the necessary resolution, but gp is not alone in its biosynthesis-dependent and biosynthesis-regulated signal- peptide cleavage (anjos et al., ; daniels et al., ; matczuk et al., ; rutkowski et al., ; zschenker et al., ). whereas structure regulates cleavage in hcmv us (rehm et al., ; tamura et al., ), function regulates cleavage in .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / erad-associated protein edem (rehm et al., ; tamura et al., ). these studies demonstrate that a variety of conditions including nascent-chain length and n- glycan addition can play a role in signal peptide cleavage. signal peptides are more than address labels and folding and signal-peptide cleavage are more interdependent than originally thought (li et al., ; rehm et al., ; tamura et al., ). we argue that late signal-peptide cleavage may be much more common than biochemical experiments have uncovered. cleavage may occur at any time from co- translationally until late post-translationally. considering the low rate of protein synthesis, ~ to amino acids per second (braakman et al., ; horwitz et al., ; ingolia et al., ; knopf and lamfrom, ), it can lead to long average synthesis times (~ . – min for gp and ~ – min for gp ). in fact, translation rates are much more heterogeneous (ingolia et al., ): nascent chains of influenza virus ha may take > min to complete, corresponding to a rate of less than one residue per second [(braakman et al., ); unpublished observations]. for large proteins, the difference between early and late co-translational cleavage leaves a window of several minutes, during which the signal peptide functions as an anchor tethering the protein to the er membrane. sequence features in the signal peptide, such as an exposed cysteine, are given the opportunity to interact with the folding protein. the membrane tether limits conformational freedom of the protein and reduces overall conformational entropy, which is predicted to increase fidelity of protein folding and stability (dill and alonso, ; zhou, ; zhou and dill, ). this may well benefit .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the formation of n- and c-terminal contacts in proteins, which are present in ~ % of soluble pdb structures (krishna and englander, ) and are present in multiple multimeric viral glycoproteins (chen et al., ; garces et al., ; gogala et al., ; sauter et al., ; sun et al., ). here we have presented compelling evidence for the direct functional contribution of the signal peptide to hiv- gp folding. the signal peptide drives disulfide isomerization of gp during folding, increasing conformational plasticity while tethering the n- terminus, and functions as quality control organizer, leaving only after near-native conformation has been attained. as evidence grows, it becomes clear that signal peptides demonstrate functions far beyond their originally assigned roles as cellular postal codes. acknowledgements: we would like to thank members of the braakman-van der sluijs and sanders labs for their fruitful discussions and insights. in particular peter van der sluijs for critical reading of the manuscript and joseline houwman for critical reading of the manuscript and design of figure . this work was supported by grants from the dutch research council (nwo)- chemical sciences (i.br, n.m, a.l, m.q), the european union th framework program, itn “virus entry” (i.br, n.m, m.q), the european union’s horizon research and innovation program under grant agreement no. (r.w.s. and i.bo). r.w.s. is a recipient of a vici grant from the dutch research council (nwo). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / author contributions: conceptualization: n.m, m.q and i.b. methodology: n.m. investigation: n.m, m.q, i.bo and a.l. writing – original draft: n.m, m.q, and i.b. writing – review & editing: n.m, m.q, i.bo, r.w.s, a.l and i.b. funding acquisition: r.w.s and i.b. declaration of interests: the authors declare no competing interests. figure legends figure . signal-peptide cleavage requires the gp c-terminus a) schematic representation of gp amino-acid sequence with its signal peptide (orange) still attached [adapted from (leonard et al., )]. gp inner domain (grey) and outer domain (pink) [according to (pancera et al., )], cysteines (red) are numbered and disulfide bonds represented by red bars. thickness of disulfide bonds is representative of their importance for folding and (or) infectivity [thickest essential for folding, middle dispensable for folding, essential for infectivity, thinnest dispensable for both folding and infectivity (van anken et al., )]. gp contains five constant regions (c -c ) and five variable regions (green, v -v ). oligomannose and complex glycans are represented as three- or two-pronged forked symbols respectively [adapted from (leonard et al., )]. amino-acid stretch - is marked in teal. b) hela cells transiently expressing gp wt and c-terminal truncations were radiolabeled for minutes and chased for the indicated times. after detergent lysis, samples were immunoprecipitated with polyclonal antibody . after immunoprecipitation, samples .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / were analyzed by non-reducing (cells nr) and reducing (cells r) . % sds-page. gp was immunoprecipitated from medium samples with antibody and directly analyzed by reducing . % sds-page (medium). gels were dried and exposed to kodak-mr films or fujifilm phosphor screens for quantification. ru: reduced, signal peptide cleaved gp , rc: reduced, signal-peptide-uncleaved gp , it: intermediates, nt: native. figure . integration of gp n- and c-terminus regulates signal-peptide cleavage a) gp crystal structure, cez (garces et al., ), domains are colored as in figure a. n and c termini are indicated; disulfide bonds are shown as red lines. inner- domain -sandwich is boxed. b) zoom in of inner-domain -sandwich. c-terminal -strand in teal with k forming hydrogen bonds (dashed lines) with e , e and main-chain oxygen of n . beta strands are numbered, and disulfide bonds indicated as red lines. amino acids are named and numbered according to hxb sequence. c) experiments as in figure c with hela cells expressing wt gp or indicated - sandwich mutants. d) quantifications of experiments performed as in c, intracellular levels at ’ were used to correct for differences in expression between mutants and corrected values compared to wild-type secretion at h. error bars: sd. e) as in d except % signal peptide cleaved at h was measured from reducing gels. f) luciferase-based infectivity assay on tzm-bl cells. cells were infected with pg of hiv- lai virus containing wt or mutant gp . error bars: sd. g) pulse-chase .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / performed as in figure b. **: p< . , ****: p< . . complete statistical values are listed in table . figure . signal-peptide cysteine is involved in gp oxidative folding a and b) pulse-chase experiments performed as in figure b. ru: reduced, signal peptide cleaved gp , rc: reduced, signal-peptide-uncleaved gp , it: intermediates, nt: native. c) schematic representation of gp x truncation construct with its signal anchor (orange), ectodomain (grey), numbered cysteines (red) and disulfide bond indicated by red bar; c-terminal ha tag in yellow; n-glycan depicted as forked structure. d) schematic representation of mpeg alkylation-switch assay. in short, free cysteines are alkylated by nem, which is excluded from disulfide bonds. disulfide bonds are then reduced and resulting free cysteines are alkylated with mpeg- malemide which provides a kda shift per cysteine alkylated when analyzed by sds- page. e) hek t cells expressing the indicated x truncations were pulse labeled for minutes in the presence (+) or absence (–) of mm dtt. at the end of the pulse, cells were scraped from dishes, homogenized and subjected to the double-alkylation mpeg-malemide alkylation protocol depicted in d (appenzeller-herzog and ellgaard, ). after alkylation, samples were immunoprecipitated with a polyclonal antibody recognizing the ha-tag and analyzed by non-reducing - % gradient sds-page. *: background band. f) autoradiographs from experiments performed as described in e were quantified. error bars: sd. **: p< . ; complete statistical values are listed in table . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . gp exhibits long-distance, non-native disulfide bonds during early folding a) schematic representation of gp thrombin-cleavable construct. inner domain (grey), outer domain (pink) and variable loops (green) from figure a. black bar indicates cleavage site for thrombin. b) pulse-chase experiments conducted as in figure b with a min pulse labeling, except that detergent lysates were immunoprecipitated with polyclonal serum ht . after immunoprecipitation, samples were cleaved with thrombin or mock treated for ’ at rt. all samples then were analyzed by - % discontinuous sds-page. nc: non-cleaved, full-length protein, c’: c-terminal fragment; n’: n-terminal fragment. c) zoom in of gels from b showing full- length and c-terminal fragments, lane profiles were generated from autoradiographs in imagequant tl. d) quantifications of autoradiographs from b. values were calculated by dividing the signal in the n-terminal fragment by the full-length uncleaved protein and subtracting the value for reducing conditions from non-reducing conditions to determine percent of molecules with a long-distance disulfide bond. resulting values then were normalized to wild type. error bars: sd. *: p< . , **: p< . , ***: p< . . complete statistical values are listed in table . figure . folding of thrombin-cleavable gp construct a) pulse-chase experiments conducted as in figure b except that hela cells were transfected with wild-type full-length gp (gp wt) or thrombin-cleavable gp (gp th) and pulsed for minutes. b) pulse-chase experiments conducted as in figure b except that hela cells were transfected with cysteine mutants of gp th. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / polyclonal serum ht was used for immunoprecipitation from detergent lysates and samples were analyzed by . % non-reducing sds-page. it: folding intermediates, nt: native gp , ru: reduced signal-peptide-uncleaved gp , rc: reduced signal- peptide-cleaved gp . red text in (a) refers to gp th running positions. figure . c a gp is detrimental to hiv- production and pseudovirus infectivity a) hek t cells were transfected with wild-type or mutant plai constructs and virus production was measured by ca-p elisa. b) infection assays were performed as in figure f except with wild-type or c a gp containing hiv- , as produced in a. bg = background. c) virus produced as in a, except cells expressed wild-type or mutant jr-fl constructs along with packaging plasmids. d) infectivity assays were performed as in figure e. error bars: sd, psg Δenv: virus produced without gp plasmid, bg: background, *: p< . , **: p< . , ****: p< . . complete statistical values are listed in table . figure . model for gp folding, signal-peptide cleavage and intramolecular disulfide shuffling. a) post-translational domain folding and signal-peptide cleavage of gp . grey: inner domain, bright pink: outer domain, green: variable loops, orange: signal peptide, light pink: ribosome, blue: sec translocon. b) conformational changes in the signal peptide and proximal areas during gp folding that lead to cleavage. colors as in (a). c) c sustains intramolecular disulphide isomerization by interacting with downstream .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cysteine residues. solid lines: interactions found experimentally, dashed lines: predicted interactions. colors as in (a). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table . complete statistical reporting for experiments in figures , and . figure anova summary multiple comparisons (sidak’s multiple comparison test) f p value r squared pair adjusted p value d . < . . wt vs k e wt vs e k wt vs e k k e vs e k k e < . . . . e . < . . wt vs k e wt vs e k wt vs e k wt vs e k k e vs e k k e < . . . < . < . f . < . . wt vs k e wt vs e k wt vs e k wt vs e k k e vs e k k e < . < . < . < . . figure d comparison p value method wt vs c a . paired t test wt vs c a . paired t test wt vs c a . paired t test wt vs c a c a . paired t test wt vs c a v a . paired t test wt vs c - a . paired t test wt vs c a c - a . paired t test figure p value method .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / a . unpaired t test b < . unpaired t test c . unpaired t test d < . unpaired t test materials and methods plasmids, antibodies, reagents and viruses the full-length molecular clone of hiv- lai (plai) was the source of wild-type and mutant viruses (peden et al., ). the quikchange site-directed mutagenesis kit (stratagene) was used to introduce mutations into env in plasmid prs as described before; the entire env gene was verified by dna sequencing (sanders et al., ). mutant env genes from prs were cloned back into plai as sali-bamhi fragments. for transient transfection of gp / we used the previously described pmq plasmid (snapp et al., ). c-terminal truncations were generated by pcr of wt gp and gibson assembled back into xbai/xhoi digested pmq. the thrombin-cleavable construct was designed based on stable v v loop deletion number (bontjer et al., ) and generated from gp c - a using gibson assembly (gibson et al., ). all point mutations were introduced using quikchange site-directed mutagenesis as above. for immunoprecipitation: we used the previously described polyclonal rabbit anti-gp antibody which recognizes all forms of gp (land et al., ), polyclonal antibody ht (nih ) which was obtained from the nih aids reagent program and - ha tag antibody “mrbrown” produced by us (schildknegt et al., ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / although we studied gp of the lai isolate, we followed the canonical hxb residue numbering (genbank: k . ), which relates to the lai numbering as follows: because of an insertion of five residues in the v loop of lai gp , all cysteine residues beyond this loop have a number that is residues lower in hxb than in lai: until cys , numbering is identical, but cys in lai becomes in hxb , etc. thrombin was purchased as a lyophilized power from sigma aldrich (t- ) and stored in thrombin-storage buffer [ mm sodium citrate ph . , mm nacl, . % bsa (w/v), % glycerol (w/v)]. cells and transfections the supt cell line was cultured in advanced rpmi medium (gibco), supplemented with % fetal calf serum (v/v, fcs), mm l-glutamine (gibco), units/ml penicillin and µg/ml streptomycin. the tzm-bl reporter cell line, obtained from nih aids research and reference reagent program, division of aids, niaid, nih (john c. kappes, xiaoyun wu, and tranzyme, inc., (durham, nc)), the hek t cell line, and the c a cell line were cultured in dulbecco’s modified eagle medium (gibco) containing % fcs, units/ml penicillin and µg/ml streptomycin. hela cells (atcc) were maintained in mem containing % fcs, nonessential amino acids, glutamax and penicillin/streptomycin ( u/ml). twenty-four hours before pulse labeling, hela cells were transfected with pmq gp /gp or ha constructs using polyethylenimine (polysciences) as described before (hoelen et al., ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / virus production virus stocks were produced by transfecting hek t cells with wild-type or mutant plai constructs using the lipofectamine transfection reagent (invitrogen) per manufacturer’s protocol. production of virus stocks on c a cells were done by calcium-phosphate precipitation. the virus-containing culture supernatants were harvested days post-transfection, stored at - °c, and the virus concentrations were quantitated by ca-p elisa as described before (moore and jarrett, ). these values were used to normalize the amount of virus used in subsequent infection experiments. single cycle infection the tzm-bl reporter cell line stably expresses high levels of cd and hiv- coreceptors ccr and cxcr and contains the luciferase and β-galactosidase genes under the control of the hiv- long-terminal-repeat (ltr) promoter (wei et al., ). single-cycle infectivity assays were performed as described before (bontjer et al., ; bontjer et al., ). in brief, one day prior to infection, x tzm-bl cells per well were plated on a -well plate in dmem containing % fcs, units/ml penicillin and µg/ml streptomycin and incubated at ºc with % co . a fixed amount of virus lai virus ( pg of ca-p ) or a fixed amount of jr-fl or lai pseudo-virus ( , pg of ca-p ) was added to the cells ( - % confluency) in the presence of nm saquinavir (roche) to block secondary rounds of infection and µg/ml deae in a total volume of µl. two days post-infection, medium was removed, cells were washed with phosphate-buffered saline ( mm sodium phosphate buffer, ph . , mm nacl) and lysed in reporter .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / lysis buffer (promega). luciferase activity was measured using a luciferase assay kit (promega) and a glomax luminometer (turner biosystems) per manufacturer’s instructions. uninfected cells were used to correct for background luciferase activity. all infections were performed in quadruplicate. folding assay hela cells transfected with wild-type or mutant gp /gp constructs were subjected to pulse-chase analysis as described before (mccaul et al., ; snapp et al., ). in short, cells were starved for cysteine and methionine for - min and pulse labeled for min with µci/ -mm dish of easytag express s protein labeling mix (perkin elmer). where indicated (+dtt), cells were incubated with mm dtt for min before and during the pulse. the pulse was stopped, and chase started by the first of washes with chase medium containing an excess of unlabeled cysteine and methionine. at the end of each chase, medium was collected, and cells were cooled on ice and further disulfide bond formation and isomerization was blocked with mm iodoacetamide. cells were lysed and detergent lysates and medium samples were subjected to overnight immunoprecipitation at c with polyclonal antibody against gp . deglycosylation, sds-page, and autoradiography where appropriate, to identify gp folding intermediates, glycans were removed from lysate-derived gp or gp with endoglycosidase h (roche) treatment of the immunoprecipitates as described before (land et al., ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / samples were subjected to non-reducing and reducing ( mm dtt) sds-page. gels were dried and exposed to super-resolution phosphor screens (fujifilm) or kodak biomax mr films (carestream). phosphor screens were scanned with a typhoon fla- scanner (ge healthcare life sciences). quantifications were performed with imagequanttl software (ge healthcare life sciences). mpeg treatment hek t cells transfected with wild-type or mutant x were subjected to radioactive labeling as described above. at the end of the labeling, cells were transferred to ice and incubated in dulbecco’s pbs without ca + and mg + containing mm n-ethyl malemide (nem) and mm edta. cells then were subjected to a modified “double- alkylation variant” mpeg treatment as described by appenzeller-herzog and ellgaard (appenzeller-herzog and ellgaard, ). in short, cells were homogenized by passage through a -g needle and proteins denatured with % sds for h @ c. samples then were alkylated again with mm nem before immunoprecipitation with anti-ha tag antibody mrbrown for hours at c. after immunoprecipitation, samples were denatured and reduced with mm tcep followed by incubation with mm mpeg- mal for h at room temperature. samples were immunoprecipitated again via the ha-tag and analyzed by - % non-reducing gradient sds-page (biorad) and processed as before. thrombin cleavage .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / after hela cells transfected with various thrombin-cleavable constructs were pulse- labeled as described above, detergent lysates were immunoprecipitated with antibody ht for h at c with rotation. immunoprecipitates were washed and resuspended in µl thrombin cleavage buffer ( mm tris-hcl, ph . , mm nacl, . mm cacl ) + . % sds and denatured for minutes at c. sds was quenched by addition of µl cleavage buffer + % tx . thrombin ( . u) in µl cleavage buffer then was added to samples and incubated for exactly minutes. for mock-digested samples, an equivalent volume of thrombin storage buffer was added instead. digestion was stopped by the addition of hot ( °c) x sample buffer and immediately placing in a c heat block for minutes. samples then were subjected to non-reducing or reducing ( mm dtt) - % discontinuous-gradient sds-page and processed as before. statistical reporting statistics for each experiment were calculated using prism (graphpad). for experiments in figures d-f differences were assessed using a one-way anova with follow-up testing to analyze differences between specific pairs with p values corrected for multiple comparisons. for experiments in figure f and d differences were assessed using paired t-tests between wild-type and mutants. for experiments in figure a-d differences were assessed using unpaired t-tests. a complete list of all pairs examined, statistical methods and resulting p values can be found in table . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure s . n-terminal truncations of gp retain their signal peptides. pulse-chase experiments were performed as in figure b except that hela cells were transfected with the indicated gp truncations. detergent lysates were immunoprecipitated either with polyclonal serum (a) or a polyclonal serum that recognizes the signal peptide (b). figure s . inner domain β-sandwich mutants affect gp folding and hiv infectivity. a) pulse-chase experiments were performed as in figure c except that hela cells were transfected with the indicated mutants. b) uncropped gels from figure c. it: folding intermediates, nt: native gp , ru: reduced signal-peptide-uncleaved gp , rc: reduced signal-peptide-cleaved gp . c) quantifications performed as in figure d. d) quantifications performed as in figure e. e) infection assays were performed as in figure f. error bars: sd. figure s . synchronized folding of gp wt and c a. a) pulse-chase experiment was performed as in figure b except that cells expressing wt or c a gp were treated from minutes before the pulse with mm puromycin and chased in the presence of mm cycloheximide. samples were analyzed by reducing . % sds-page after immunoprecipitation. b) lane profiles from a. ru: reduced signal-peptide-uncleaved gp , rc: reduced signal-peptide-cleaved gp . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure s . removal of signal-peptide cysteine c aggravates folding phenotype of - disulfide-bond mutants pulse-chase experiments were performed as in figure b except that hela cells expressed c and disulfide-bond - mutants. references anjos, s., nguyen, a., ounissi-benkalha, h., tessier, m.c., and polychronakos, c. ( ). a common autoimmunity predisposing signal peptide variant of the cytotoxic t- lymphocyte antigen results in inefficient glycosylation of the susceptibility allele. j biol chem , - . appenzeller-herzog, c., and ellgaard, l. ( ). in vivo reduction-oxidation state of protein disulfide isomerase: the two active sites independently occur in the reduced and oxidized forms. antioxid redox signal , - . blobel, g., and dobberstein, b. ( ). transfer of proteins across membranes. i. presence of proteolytically processed and unprocessed nascent immunoglobulin light chains on membrane-bound ribosomes of murine myeloma. j cell biol , - . bonfanti, r., colombo, c., nocerino, v., massa, o., lampasona, v., iafusco, d., viscardi, m., chiumello, g., meschi, f., and barbetti, f. ( ). insulin gene mutations as cause of diabetes in children negative for five type diabetes autoantibodies. diabetes care , - . bontjer, i., land, a., eggink, d., verkade, e., tuin, k., baldwin, c., pollakis, g., paxton, w.a., braakman, i., berkhout, b., et al. ( ). optimization of human immunodeficiency virus type envelope glycoproteins with v /v deleted, using virus evolution. j virol , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / borochov, h., and shinitzky, m. ( ). vertical displacement of membrane proteins mediated by changes in microviscosity. proc natl acad sci u s a , - . braakman, i., hoover-litty, h., wagner, k.r., and helenius, a. ( ). folding of influenza hemagglutinin in the endoplasmic reticulum. j cell biol , - . brockwell, d.j., paci, e., zinober, r.c., beddard, g.s., olmsted, p.d., smith, d.a., perham, r.n., and radford, s.e. ( ). pulling geometry defines the mechanical resistance of a beta-sheet protein. nat struct biol , - . carrion-vazquez, m., li, h., lu, h., marszalek, p.e., oberhauser, a.f., and fernandez, j.m. ( ). the mechanical stability of ubiquitin is linkage dependent. nat struct biol , - . chen, j., lee, k.h., steinhauer, d.a., stevens, d.j., skehel, j.j., and wiley, d.c. ( ). structure of the hemagglutinin precursor cleavage site, a determinant of influenza pathogenicity and the origin of the labile conformation. cell , - . chen, y., radford, s.e., and brockwell, d.j. ( ). force-induced remodelling of proteins and their complexes. curr opin struct biol , - . creighton, t.e., bagley, c.j., cooper, l., darby, n.j., freedman, r.b., kemmink, j., and sheikh, a. ( ). on the biosynthesis of bovine pancreatic trypsin inhibitor (bpti). structure, processing, folding and disulphide bond formation of the precursor in vitro and in microsomes. j mol biol , - . daniels, r., kurowski, b., johnson, a.e., and hebert, d.n. ( ). n-linked glycans direct the cotranslational folding pathway of influenza hemagglutinin. mol cell , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / danielson, m.a., biemann, h.p., koshland, d.e., jr., and falke, j.j. ( ). attractant- and disulfide-induced conformational changes in the ligand binding domain of the chemotaxis aspartate receptor: a f nmr study. biochemistry , - . darby, n.j., morin, p.e., talbo, g., and creighton, t.e. ( ). refolding of bovine pancreatic trypsin inhibitor via non-native disulphide intermediates. j mol biol , - . decroly, e., vandenbranden, m., ruysschaert, j.m., cogniaux, j., jacob, g.s., howard, s.c., marshall, g., kompelli, a., basak, a., jean, f., et al. ( ). the convertases furin and pc can both cleave the human immunodeficiency virus (hiv)- envelope glycoprotein gp into gp (hiv- su) and gp (hiv-i tm). j biol chem , - . del rio, a., perez-jimenez, r., liu, r., roca-cusachs, p., fernandez, j.m., and sheetz, m.p. ( ). stretching single talin rod molecules activates vinculin binding. science , - . dill, k.a., and alonso, d.o.v. ( ). conformational entropy and protein stability (berlin, heidelberg: springer berlin heidelberg). earl, p.l., doms, r.w., and moss, b. ( ). oligomeric structure of the human immunodeficiency virus type envelope glycoprotein. proc natl acad sci u s a , - . earl, p.l., moss, b., and doms, r.w. ( ). folding, interaction with grp -bip, assembly, and transport of the human immunodeficiency virus type envelope protein. j virol , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / eckels, e.c., haldar, s., tapia-rojo, r., rivas-pardo, j.a., and fernandez, j.m. ( ). the mechanical power of titin folding. cell rep , - e . ellgaard, l., mccaul, n., chatsisvili, a., and braakman, i. ( ). co- and post- translational protein folding in the er. traffic , - . garces, f., lee, j.h., de val, n., de la pena, a.t., kong, l., puchades, c., hua, y., stanfield, r.l., burton, d.r., moore, j.p., et al. ( ). affinity maturation of a potent family of hiv antibodies is primarily focused on accommodating or avoiding glycans. immunity , - . garcia-maroto, f., castagnaro, a., sanchez de la hoz, p., marana, c., carbonero, p., and garcia-olmedo, f. ( ). extreme variations in the ratios of non-synonymous to synonymous nucleotide substitution rates in signal peptide evolution. febs lett , - . gibson, d.g., young, l., chuang, r.y., venter, j.c., hutchison, c.a., rd, and smith, h.o. ( ). enzymatic assembly of dna molecules up to several hundred kilobases. nat methods , - . gogala, m., becker, t., beatrix, b., armache, j.p., barrio-garcia, c., berninghausen, o., and beckmann, r. ( ). structures of the sec complex engaged in nascent peptide translocation or membrane insertion. nature , - . gordon, w.r., zimmerman, b., he, l., miles, l.j., huang, j., tiyanont, k., mcarthur, d.g., aster, j.c., perrimon, n., loparo, j.j., et al. ( ). mechanical allostery: evidence for a force requirement in the proteolytic activation of notch. dev cell , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / gorlich, d., hartmann, e., prehn, s., and rapoport, t.a. ( ). a protein of the endoplasmic reticulum involved early in polypeptide translocation. nature , - . görlich, d., prehn, s., hartmann, e., kalies, k.-u., and rapoport, t.a. ( ). a mammalian homolog of sec p and secyp is associated with ribosomes and nascent polypeptides during translocation. cell , - . hallenberger, s., bosch, v., angliker, h., shaw, e., klenk, h.d., and garten, w. ( ). inhibition of furin-mediated cleavage activation of hiv- glycoprotein gp . nature , - . hegde, r.s., and bernstein, h.d. ( ). the surprising complexity of signal sequences. trends biochem sci , - . hertadi, r., gruswitz, f., silver, l., koide, a., koide, s., arakawa, h., and ikai, a. ( ). unfolding mechanics of multiple ospa substructures investigated with single molecule force spectroscopy. j mol biol , - . hoelen, h., kleizen, b., schmidt, a., richardson, j., charitou, p., thomas, p.j., and braakman, i. ( ). the primary folding defect and rescue of deltaf cftr emerge during translation of the mutant domain. plos one , e . horwitz, m.s., scharff, m.d., and maizel, j.v., jr. ( ). synthesis and assembly of adenovirus . i. polypeptide synthesis, assembly of capsomeres, and morphogenesis of the virion. virology , - . ingolia, n.t., hussmann, j.a., and weissman, j.s. ( ). ribosome profiling: global views of translation. cold spring harb perspect biol . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ingolia, n.t., lareau, l.f., and weissman, j.s. ( ). ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes. cell , - . jackson, r.c., and blobel, g. ( ). post-translational cleavage of presecretory proteins with an extract of rough microsomes from dog pancreas containing signal peptidase activity. proc natl acad sci u s a , - . jansen, g., maattanen, p., denisov, a.y., scarffe, l., schade, b., balghi, h., dejgaard, k., chen, l.y., muller, w.j., gehring, k., et al. ( ). an interaction map of endoplasmic reticulum chaperones and foldases. mol cell proteomics , - . julien, j.p., cupo, a., sok, d., stanfield, r.l., lyumkis, d., deller, m.c., klasse, p.j., burton, d.r., sanders, r.w., moore, j.p., et al. ( ). crystal structure of a soluble cleaved hiv- envelope trimer. science , - . kanapin, a., batalov, s., davis, m.j., gough, j., grimmond, s., kawaji, h., magrane, m., matsuda, h., schonbach, c., teasdale, r.d., et al. ( ). mouse proteome analysis. genome res , - . knopf, p.m., and lamfrom, h. ( ). changes in the ribosome distribution during incubation of rabbit reticulocytes in vitro. biochim biophys acta , - . krishna, m.m., and englander, s.w. ( ). the n-terminal to c-terminal motif in protein folding and function. proc natl acad sci u s a , - . kwong, p.d., wyatt, r., robinson, j., sweet, r.w., sodroski, j., and hendrickson, w.a. ( ). structure of an hiv gp envelope glycoprotein in complex with the cd receptor and a neutralizing human antibody. nature , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / land, a., and braakman, i. ( ). folding of the human immunodeficiency virus type envelope glycoprotein in the endoplasmic reticulum. biochimie , - . land, a., zonneveld, d., and braakman, i. ( ). folding of hiv- envelope glycoprotein involves extensive isomerization of disulfide bonds and conformation- dependent leader peptide cleavage. faseb j , - . li, y., bergeron, j.j., luo, l., ou, w.j., thomas, d.y., and kang, c.y. ( ). effects of inefficient cleavage of the signal sequence of hiv- gp on its association with calnexin, folding, and intracellular transport. proc natl acad sci u s a , - . li, y., luo, l., thomas, d.y., and kang, c.y. ( ). control of expression, glycosylation, and secretion of hiv- gp by homologous and heterologous signal sequences. virology , - . li, y., luo, l., thomas, d.y., and kang, c.y. ( ). the hiv- env protein signal sequence retards its cleavage and down-regulates the glycoprotein folding. virology , - . lingappa, v.r., devillers-thiery, a., and blobel, g. ( ). nascent prehormones are intermediates in the biosynthesis of authentic bovine pituitary growth hormone and prolactin. proc natl acad sci u s a , - . liu, s., cheng, w., fowle grider, r., shen, g., and li, w. ( ). structures of an intramembrane vitamin k epoxide reductase homolog reveal control mechanisms for electron transfer. nat commun , . lyumkis, d., julien, j.p., de val, n., cupo, a., potter, c.s., klasse, p.j., burton, d.r., sanders, r.w., moore, j.p., carragher, b., et al. ( ). cryo-em structure of a fully glycosylated soluble cleaved hiv- envelope trimer. science , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / matczuk, a.k., kunec, d., and veit, m. ( ). co-translational processing of glycoprotein from equine arteritis virus: n-glycosylation adjacent to the signal peptide prevents cleavage. j biol chem , - . mccaul, n., yeoh, h.y., van zadelhoff, g., lodder, n., kleizen, b., and braakman, i. ( ). analysis of protein folding, transport, and degradation in living cells by radioactive pulse chase. j vis exp. moore, j.p., and jarrett, r.f. ( ). sensitive elisa for the gp and gp surface glycoproteins of hiv- . aids res hum retroviruses , - . morrison, g.m., semple, c.a., kilanowski, f.m., hill, r.e., and dorin, j.r. ( ). signal sequence conservation and mature peptide divergence within subgroups of the murine beta-defensin gene family. mol biol evol , - . mowbray, s.l., and koshland, d.e., jr. ( ). additive and independent responses in a single receptor: aspartate and maltose stimuli on the tar protein. cell , - . peden, k., emerman, m., and montagnier, l. ( ). changes in growth properties on passage in tissue culture of viruses derived from infectious molecular clones of hiv- lai, hiv- mal, and hiv- eli. virology , - . pfeiffer, t., pisch, t., devitt, g., holtkotte, d., and bosch, v. ( ). effects of signal peptide exchange on hiv- glycoprotein expression and viral infectivity in mammalian cells. febs lett , - . piersma, d., berns, e.m., verhoef-post, m., uitterlinden, a.g., braakman, i., pols, h.a., and themmen, a.p. ( ). a common polymorphism renders the luteinizing hormone receptor protein more active by improving signal peptide function and predicts adverse outcome in breast cancer patients. j clin endocrinol metab , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / rehm, a., stern, p., ploegh, h.l., and tortorella, d. ( ). signal peptide cleavage of a type i membrane protein, hcmv us , is dependent on its membrane anchor. embo j , - . rognoni, l., stigler, j., pelz, b., ylanne, j., and rief, m. ( ). dynamic force sensing of filamin revealed in single-molecule experiments. proc natl acad sci u s a , - . rutkowski, d.t., ott, c.m., polansky, j.r., and lingappa, v.r. ( ). signal sequences initiate the pathway of maturation in the endoplasmic reticulum lumen. j biol chem , - . sanders, r.w., dankers, m.m., busser, e., caffrey, m., moore, j.p., and berkhout, b. ( ). evolution of the hiv- envelope glycoproteins with a disulfide bond between gp and gp . retrovirology , . sanders, r.w., hsu, s.t., van anken, e., liscaljet, i.m., dankers, m., bontjer, i., land, a., braakman, i., bonvin, a.m., and berkhout, b. ( ). evolution rescues folding of human immunodeficiency virus- envelope glycoprotein gp lacking a conserved disulfide bond. mol biol cell , - . sauter, n.k., hanson, j.e., glick, g.d., brown, j.h., crowther, r.l., park, s.j., skehel, j.j., and wiley, d.c. ( ). binding of influenza virus hemagglutinin to analogs of its cell-surface receptor, sialic acid: analysis by proton nuclear magnetic resonance spectroscopy and x-ray crystallography. biochemistry , - . schildknegt, d., lodder, n., pandey, a., egmond, m., pena, f., braakman, i., and van der sluijs, p. ( ). characterization of cnpy and its family members. protein sci , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / schulman, s., wang, b., li, w., and rapoport, t.a. ( ). vitamin k epoxide reductase prefers er membrane-anchored thioredoxin-like redox partners. proc natl acad sci u s a , - . snapp, e.l., mccaul, n., quandte, m., cabartova, z., bontjer, i., källgren, c., nilsson, i., land, a., von heijne, g., sanders, r.w., et al. ( ). structure and topology around the cleavage site regulate post-translational cleavage of the hiv- gp signal peptide. elife , e . soler, m.a., and faisca, p.f. ( ). how difficult is it to fold a knotted protein? in silico insights from surface-tethered folding experiments. plos one , e . srinivasan, n., sowdhamini, r., ramakrishnan, c., and balaram, p. ( ). conformations of disulfide bridges in proteins. int j pept protein res , - . sun, x., li, q., wu, y., wang, m., liu, y., qi, j., vavricka, c.j., and gao, g.f. ( ). structure of influenza virus n : the last piece of the neuraminidase "jigsaw" puzzle. j virol , - . tamura, t., cormier, j.h., and hebert, d.n. ( ). characterization of early edem protein maturation events and their functional implications. j biol chem , - . tatu, u., braakman, i., and helenius, a. ( ). membrane glycoprotein folding, oligomerization and intracellular transport: effects of dithiothreitol in living cells. embo j , - . tatu, u., hammond, c., and helenius, a. ( ). folding and oligomerization of influenza hemagglutinin in the er and the intermediate compartment. embo j , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / van anken, e., sanders, r.w., liscaljet, i.m., land, a., bontjer, i., tillemans, s., nabatov, a.a., paxton, w.a., berkhout, b., and braakman, i. ( ). only five of strictly conserved disulfide bonds are essential for folding and eight for function of the hiv- envelope glycoprotein. mol biol cell , - . van damme, n., goff, d., katsura, c., jorgenson, r.l., mitchell, r., johnson, m.c., stephens, e.b., and guatelli, j. ( ). the interferon-induced protein bst- restricts hiv- release and is downregulated from the cell surface by the viral vpu protein. cell host microbe , - . veitia, r.a., and caburet, s. ( ). extensive sequence turnover of the signal peptides of members of the gdf/bmp family: exploring their evolutionary landscape. biol direct , . von heijne, g. ( ). patterns of amino acids near signal-sequence cleavage sites. eur j biochem , - . von heijne, g. ( ). analysis of the distribution of charged residues in the n-terminal region of signal sequences: implications for protein export in prokaryotic and eukaryotic cells. embo j , - . von heijne, g. ( ). signal sequences. the limits of variation. j mol biol , - . walter, p. ( ). translocation of proteins across the endoplasmic reticulum iii. signal recognition protein (srp) causes signal sequence-dependent and site- specific arrest of chain elongation that is released by microsomal membranes. the journal of cell biology , - . weissman, j.s., and kim, p.s. ( ). the pro region of bpti facilitates folding. cell , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / weissman, j.s., and kim, p.s. ( ). efficient catalysis of disulphide bond rearrangements by protein disulphide isomerase. nature , - . weissman, j.s., and kim, p.s. ( ). a kinetic explanation for the rearrangement pathway of bpti folding. nat struct biol , - . williams, e.j., pal, c., and hurst, l.d. ( ). the molecular evolution of signal peptides. gene , - . wyatt, r., and sodroski, j. ( ). the hiv- envelope glycoproteins: fusogens, antigens, and immunogens. science , - . yang, x., mahony, e., holm, g.h., kassa, a., and sodroski, j. ( ). role of the gp inner domain β-sandwich in the interaction between the human immunodeficiency virus envelope glycoprotein subunits. virology , - . yao, m., goult, b.t., chen, h., cong, p., sheetz, m.p., and yan, j. ( ). mechanical activation of vinculin binding to talin locks talin in an unfolded conformation. sci rep , . zhang, x., halvorsen, k., zhang, c.z., wong, w.p., and springer, t.a. ( ). mechanoenzymatic cleavage of the ultralarge vascular protein von willebrand factor. science , - . zhou, h.x. ( ). protein folding in confined and crowded environments. arch biochem biophys , - . zhou, h.x., and dill, k.a. ( ). stabilization of proteins in confined spaces. biochemistry , - . zschenker, o., jung, n., rethmeier, j., trautwein, s., hertel, s., zeigler, m., and ameis, d. ( ). characterization of lysosomal acid lipase mutations in the signal .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / peptide and mature polypeptide region causing wolman disease. j lipid res , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ►◄ c c gp gp signal peptide er-membrane er lumen cytosol n- c- v v v v v c c c c c a f f a a i l a l l a v g ggg s s s l l s e l s l s e l s l r n n n n n n n t e l i s t q e l i s t w l i s t q w e l w l r q v i t q f v w e l i s t q f w e l i s w l a i q v w k e l a i q v w k a r q v a i t q v w k e m l a i t q w c m l w c m d l a y h q v k d l a y h p m a r n n n n n n g g g g g g g r r r e r qe l s t q e l s f l q e l q r q p t q p v e l i t q p w l i w d l i y q p v k d l a i y h p v k a r n g g g g g g g r r r r l s l s l l r l s d l fd l e l r v e l i t dl i c d l i v dl i y h v a r n g g g r r r r r se l r e l s e l s l a e l a i v w a rq v a i t q w e l a i t q w l i w c l a i y q v k d la i y h p v k a r n n n g g g g sr w v f i i m i v k h q g v w k e m lr y p v w k e r v w k y g v k t y v w l v w ke a t n v ca t h f n v c e d l a s t p n v k e d a t p n v w k e d a t y h q p n v k e m d l a i t y q f n v w k e md i t h q v w k e m d li s v g p v l pv a t g n wk e mdl r sy g n dr g n vk a r q g p n v k e r g n r g f n e d i r s ns g n e s e s m s k e m i s t k m t g n f s c a t n k l l c l s p v c k d l s tp v c k l t g n n c i s t q v c at g r n k i t is t k d i s t p k l t y g r p f v i p f n v k a i s t p f c e i s t f c d a i s t y q p v c k e d l a i s t y h q p v c k l a t y g r n nn n g r s t ls l iv e l i v g r v it q v e l i s t q f e l i s t f c l a i s t q p v c e d l a i st h q p v c k l a t g r n n n n g r g r g rn nn gr k i k g r a i k a i q i s f a i s t q k e l a i s t h q p v w k m l a t r n n n n n g g f f i s f i s qk e i s t h q p v c k t r n n n g g d s g s t s f e g f t f e i s t q p f e i s t f wi s t q p w c k e m d l a i s t y q p v w c k m l a t y r n n n n n n g g g c as t g m l i t g m l w l i r ’ ’ ’ h h h h h h ’ ’ ’ h h h chase wtit nt ru rc xit nt ru rc x it nt ru rc cells nr cells r medium b figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / - - - - - - - - - c n a k e n e b β β β ’ ’ ’ it nt h h h rc ’ ’ ’ h h h h h h ru cells nr cells r medium it nt rc ru rc ru it nt rc ru it nt it nt rc ru c wt e k k e e k e k k e infectivitysignal-peptide cleavagesecretion fed g a v wt ’ ’ ’ h h h h h h cells nr cells r medium agg it nt ru rc ehr figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ ’ it nt ru rc chase nr r wt c a a c a c a c a c a c - a c - a c - a c - a c a c - a c a c - a c a c - a c a c - a ’ ’ ’ ’ it nt chase b er-membrane er lumen cytosol n- -c c ha-tag signal anchor h q gv wk e m lr y p p p v w k e r v w k y y y g v k t y v w l v w ke a t n v ca a t h f n v c d l a s t p n v e a t p n v w k e d a t h q p n v e m d l a t y f n v w k e md i t h q v v e m d d d i s c as t g m l i t g m l w l i c c c . nem c c c c c c c c c c tcep c cc c . mpeg c cc c sds-page d f figure e + - + - + - + - + -dtt pulse wt c a c a c a c a c - a c c c c * .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / n- -c ~ kda n ~ kda c ~ kda a nr rnr r nr rnr r nr rnr r nr rnr r nr rnr r nr rnr r nr rnr r nr rnr r +- +- +- +- +- + + +- - - thrombin nc c’ n’ wt c a c a c a c a c a c a c a c - a c a c - a b c « « « « « « « « « < < wt c a c a c a c a c a c a c a c - a c a c - a nc c’ d figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ’ ’ ’ w t c a c a c a c a c a c a c a c - a c a c - a w t c a c a c a c a c a c a c a c - a c a c - a w t c a c a c a c a c a c a c a c - a c a c - a it nt b ’ ’ ’ h ’ ’ ’ h ’ ’ ’ h ’ ’ ’ h it nt nt a chase gp wt gp wtgp th gp th ru ru rc rc figure nr r .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure a b c d .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / completion inner-domain breaks i iviiiii cytosol er lumen structure signal-peptide conformation to cleavage signal peptide sustains disulfide isomerization and restrains n-terminus c c c c cc c c c c c c c c c c c c c od id v id v od n c od α-helical prevents cleavage of folding helix stabilized due signal-peptide id v od figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / amino acids targeted based metabolomics study in non-segmental vitiligo: a pilot study amino acids targeted based metabolomics study in non-segmental vitiligo: a pilot study rezvan marzabani , hassan rezadoost , peyman chopanian , nikoo mozafari , mohieddin jafari , mehdi mirzaie , mehrdad karimi department of phytochemistry, medicinal plants and drugs research institute, shahid beheshti university, g.c., tehran, iran skin research center, shahid beheshti university of medical sciences, tehran, iran institute for molecular medicine finland (fimm), helsinki institute of life science, university of helsinki, helsinki, finland department of applied mathematics, faculty of mathematical sciences, tarbiat modares university, tehran, iran. school of traditional medicine, tehran university of medical sciences, tehran, iran .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / abstract introduction: vitiligo is an asymptomatic disorder that results from the loss of pigments (melanin), causing skin or mucosal depigmentation and impairs beauty. objective: due to the complexity of the pathogenesis of this disease and various theories including self-safety theory, oxidative stress, neurological theory and internal defects of melanocytes behind it, and finally, the vast role of amino acids in body metabolism and various activities of the body, amino acids targeted based metabolomics was set up to follow any fluctuation inside this disease. methodology: the study of amino acid profiles in plasma of people with non-segmental vitiligo using a liquid chromatography equipped with fluorescent detector was performed to find remarkable biomarkers for the diagnosis and evaluation of disease severity of patients with vitiligo. twenty-two amino acids derivatized with o-phthalaldehyde (opa) and fluorylmethyloxycarbonyl chloride (fmoc), were precisely determined. next, the concentrations of these twenty-two amino acids and their corresponding molar ratios were calculated in patients (including females and males) and corresponding healthy individuals ( females and males). using r programing, the data were completely analyzed between the two groups of patients and healthy to find suitable and reliable biomarkers. results: interestingly, comparing the two groups, in the patient group, tyrosine, cysteine, the ratio of tyrosine to lysine and the ratio of cysteine to ornithine were increased while, arginine, lysine, ornithine and glycine ratios to cysteine have been decreased. these amino acids were selected for identification of patients with accuracy of detection of approximately . using the assessment of logistic regression. conclusion: these results indicate a disruption of the production of melanin, increased immune activity and oxidative stress, which are also involved in the effects of vitiligo. therefore, these amino acids can be used as biomarker for the evaluation of risk, prevention of complications in individuals at risk and monitoring of treatment process. keywords: vitiligo, plasma, metabolomics, amino acids, liquid chromatography, r programing .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction vitiligo is a common chronic skin disorder in which pigment-producing cells, or melanocytes are getting in trouble that can result in varying patterns and degrees of skin depigmentation. patients are characterized by loss of epidermal melanocytes and progressive depigmentation. it is appeared in two main types, non-segmental (generalized) or segmental (armstrong, ; sahoo et al., ). regardless of much research, the etiology of vitiligo and the reasons of melanocyte death are still unclear (singh et al., ). a complex immune, genetic, environmental, and biochemical causes are behind vitiligo and the exact molecular mechanisms of its development and progression is not clear (liang et al., ; sahoo et al., ; singh et al., ). although several vitiligo susceptibility loci identified by genome-wide association studies were reported, but study examining monozygotic twins reported a vitiligo concordance rate of %, suggesting a strong environmental contribution to the pathogenesis (singh et al., ). zheleva et al. (zheleva et al., ), in their work, revealed oxidative stress is a triggering event in the melanocytic destruction and is probably involved in the etiopathogenesis of vitiligo disease. oxidative stress biomarkers could be finding in the skin and blood of vitiligo patients. hamidizadeh et al. (hamidizadeh et al., ), in their study compered hopelessness, anxiety, depression and general health of vitiligo patients in comparison with normal controls and confirmed that anxiety and hopelessness levels were significantly higher in vitiligo patients than those who are in healthy controls. it is northly to know, vitiligo worldwide prevalence is in the range of . % to % (ding et al., ). but, one the main problems accompanied with vitiligo is its psychological aspect that is experienced by many patients around the globe (grimes and miller, ). next to social or psychological distress, people with vitiligo may be at increased risk of sunburn, skin cancer, eye problems, such as inflammation of the iris (iritis) and hearing loss (jakku et al., ). there are many both conventional and unconventional therapies for vitiligo. they are including l- phenylalanine, pge and antioxidant agents, alpha lipoic acid, flavonoids, glutathione (gsh), fluorouracil, l-dopa, levamisole, l-phenylalanine, melagenine, omega- .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / polyunsatured fatty acids, cream cointaning pseudocatalase, resveratrol soybeans, metals such as zinc, minoxidil (gianfaldoni et al., ) although, the pathophysiology of vitiligo is complex, the studies revealing vitiligo cells have unique lipid and metabolite profiles (sahoo et al., ). this led to the question of which factors been associated with vitiligo activity in skin and blood. these biomarkers allow an early and accurate determination of treatment response and the progression of the disease. up to now some biomarkers is recommended for vitiligo. several markers which are received linked to vitiligo and associated with disease activity. besides providing insights into the driving mechanisms of vitiligo, these findings could reveal potential biomarkers. although genomic analyses have been performed to investigate the pathogenesis of vitiligo, but the role of small molecules and serum proteins in vitiligo remains unknown. providing insights into the driving mechanisms of vitiligo, these findings could reveal potential biomarkers. metabolomics is a powerful and promising analytical tool that allows assessment of global low-molecular-weight metabolites in biological systems. it has a great potential for identifying useful biomarkers for early diagnosis, prognosis and assessment of therapeutic interventions in clinical practice (liang et al., ; speeckaert et al., ). despite the current evidence of the effects of metabolic system on immune system and oxidative stress as two important factors in the development of vitiligo, it seems necessary to more investigation of metabolite fluctuation in this disease. we were keen to establish whether levels of important substrates such as amino acids as the most important primary metabolites were altered in vitiligo cells. this might therefore contribute to the vitiligo phenotype in melanocytes. then, the aim of this study was to investigate a comprehensive profile of amino acids in plasma of people with vitiligo in comparison with healthy people to find a fast-determinable biomarker. for this a liquid chromatography equipped with fluorescent detector was applied. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / material and methods patient samples after receiving ethical approval (the study protocol was approved by the ethics committee of our institution. also, informed consent was obtained voluntarily from each participant at the time of enrollment) from the shahid beheshti university of medical, all participants signed written informed consent. table is demonstrating the complete characterization of the case studies. in summary cases with vitiligo and healthy ones attended to the dermatology clinic of shohadaye tajrish educational hospital. the diagnosis of vitiligo was based on the characteristic loss of skin pigmentation and the examination under wood's lamp. blood samples were entered in the tube vacutainers ml containing . k edta (to prevent clotting) and were centrifugal at rpm at °c for minutes. supernatant was isolated and reserved for hplc-fd analysis at - °c. table . demographics of the study cohort information hcs* vitiligo male female age, years** . ± . . ± . duration of the disease(year) _ . ± . illness severity (body surface area involvement (%) _ . ± . active disease (having new lesions during last months) _ .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / positive family history _ * hcs= healthy controls, **= means ± sd amino acid analysis in order to prepare the samples for analysis, the samples are transferred from - ° c refrigerator and placed in the ice to be melted. to µ l of sample norleucine ( μm) and then µ l of methanol kept at - °c and all are mixed for five seconds. to completely deproteination, they are kept at - °c for hours. at the next stage, the samples are centrifuged at rom for twelve minutes at °c. the supernatant is completely transfered to heidolph rotary evaporator and dried in vacu. these samples could be reserved at °c for four weeks. for hplc analysis, previously dried samples were dissolved in µ l of water (containing . formic acid) with help of ultrasonic device for minutes. to µ l of each sample µ l opa ( for derivatization of primary amino acids) and one minute late µ l fmoc (for secondary amino acid derivatization) µ l of this sample are injected hplc column (fekkes, ; wu et al., ) for the hplc-dad method, a knauer system (wellchrom, germany) equipping with a k- pump, a k- fast scanning uv detector with simultaneous detection at four wavelengths, an autosampler s (midas), a k- analytical degasser, and a rheodyneinjector with a µ l loop was used. hplc separation was achieved using a eurospher c column ( . mm × mm, µ m), with a gradient elution program at a flow rate of . ml min− . the mobile phase was composed of a (acetonitrile + . % three flouro acetic acid, v/v) and b ( . % aqueous trifluoroacetic acid, v/v). the following gradient was applied: – min, isocratic gradient % b; – min, linear gradient - % b; – min, linear – % b; – min, linear - % b; – min, linear - % b; – min, isocratic gradient % b. the uv absorbance was monitored at nm. all injection volumes of sample and standard solutions were µ l. the chromatographic peaks of the sample solution were identified by spiking and comparing their retention times and uv spectra with those of reference standards. quantitative analysis was carried out by integration of the peak using the external standard method. identification of amino acids were conducted using fluorescence at nm and nm for .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / adsorption and excitation respectively for primary amino acids and while detectors and wavelengths (for second amino acids) and nm (for first-type amino acids) related to pda detectors . to check the accuracy of the procedure, five plasma samples related to individuals the patients were analyzed that rsd every amino acids less than were obtained (amorini et al., ; douglas, ; wu et al., ). statistical methods for statisticalanalysis, we used metaboanalyst . . before the analysis, we applied the data conversion and the mean center scale and finally the data with normal normalize quantile (the data were analyzed by shapiro-wilk test in software r and the data was not normal for some amino acid¬). to compare between study groups by r software , we used mann-whitney u test with the fdr correction (benjamini hochberg) ( �� ول ). in addition to comparing two groups of patients and healthy patients, the relationship between severity of disease, disease activity, family history, and duration of each amino acid was usedto evaluate mann whitney u tests with fdr correction (benjamini hochberg) and the average prediction score (random forest) (table , image ). in examining the trend of difference (variation figure ) in the amount of metabolites, the sample was used in two groups and clustering of partial separator (pls-da) method(figure ). in addition, to compare and investigate the correlation between two to two amino acids at the same time in all participants, the correlation matrix was plotted with a significant difference asaheatmap(form ). it was also plotted to investigate the relationship between each amino acid and the participants of the heat map (figure ). also, in order to investigate the effect of each amino acid and their ratios ( superior ratio based on pvalues) we selected as biomarker in the expression of the probability of the cause or severity of the disease, we used logistic regression, the results of the sensitivity and specificity of the test and the result of the system performance curve (roc) multiple queries (image ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / results totally determined amino acids were determined in the studied samples. table s . is demonstrating the absolute concentration for determination of twenty-two amino acids in participant group ( healthy cases and vitiligo cases). first, we performed principal component analysis with all samples, which showed that samples were well clustered in two completely separated clusters (figure. ). figure . pca analysis shows the homogeneity of data obtained by hplc-fld. samples are completely grouped to tow separated cluster. pc and pc are covered % and % all data obtained. is in al o to .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / next, amino acid distribution was evaluated by shapiro test. also, t-test was used to show amino acids differences in concentration between vitiligo and healthy samples. adjusted p- values calculated by benjamini hochberg methods. figure (a). is demonstrating volcano graph in which horizontal and vertical axes are corresponding to log fold change of sample concentrations and -log adjusted p-values respectively. as illustrated in figure (c-d), there is a significant increasing in cys, pro and glu, while lys, arg, orn, his and gly are decreased in vitiligo patients. figure (b). is showing gini error reduction diagram (average accuracy reduction, average prediction score) obtained from random forrest algorithm with tree number of . the green dots in vitiligo have increased and the red dots in vitiligo have decreased. w - ph le s a in cy er .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . (a) volcano graph related to amino acids concentration change in the studied vitiligo samples, (b) gini error reduction diagram with tree number of , (c-d) box-plot for amino acids fluctuation in both healthy and vitiligo samples. the red boxes show the metabolic concentration values in healthy individuals (control), while the blue ones show the metabolic concentration values in the sick individuals (vitiligo). the adjust p-values for each metabolite are mentioned in the figure. to show the specificity and sensitivity of the studied biomarkers, roc graph was used. also, an individual roc curve was plotted for amino acids with highest changes (figure (b). interestingly, cys and lys showed the maximum of area under curve (auc) up to . . for these two amino acids a logistic regression was done and its corresponding roc diagram was drawn. positive/negative coefficient is implying to the role of each of the selected amino acids in increasing or decreasing the risk of vitiligo. next, based on random forest method a confusion matrix developed in which two group of our study are completely classified (figure ). figure . (a) roc curve to show the sensitivity and specificity of the studied amino acids (cys, lys, tyr, orn, pro, glu, leu, and gly), (b) selected roc curve for cys and with the highest variation, (c) confusion matrix, based on random forest is completely ini nd the ch an b). or as in on ro, on .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / following the question on the variation of amino acids concentration inside vitiligo cases with two category, more than % and less than %, glu found to be a reliable biomarker. its concentration (log fc< - . ) is significantly decreased in the patient showing more than % (figure (a-b)). figure . volcano diagram related to patients with more and less % of vitiligo. (a) glu is classifying the cases according to vitiligo severity, (b) glu is decreased in patina t with more than % of vitiligo. as the ratio of biomarkers especially amino acids would be a reliable sign of disease, volcano diagram for different ratio of amino acids in the vitiligo samples are prepared. according to figure , ratios including cys/orn, gly/cys, and…are significantly group the cases of the study. ith its % no to .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . volcano diagram related to amino acid variation between the studied cases. metabolic pathway analysis metabolite set enrichment analysis (msea) was used to explore the metabolites highly enriched and associated with possible metabolic pathways. pathway-associated metabolite and disease- associated metabolite analyses were performed shows the majority of the metabolic pathways that are significantly altered in vitiligo cases. using pathway associated metabolite sets with enrichment analysis, the main pathways affected were detected. pathway impact as checked by metaboanalyst??? has shown that about pathways differ between vitiligo and healthy samples, of which the first pathways are very significant. following metabolites and metabolic cycles are found to be changed in vitiligo cases: arginine and proline metabolism, glycine and serine metabolism, glutathione metabolism, urea cycle, ammonia recycling, glutamate metabolism, alanine metabolism, carnitine synthesis, cycteine metabolism, lysine degradation, beta-alanine metabolism, aspartate metabolism, and methyl histidine metabolism. these are pathways and metabolic cycles, which differed significantly between vcs and hcs. on the other hand, disease-associated metabolite sets compared between vcs and hcs. ornithine transcarbamylase deficiency (otc), hyperornithinemia with gyrate atrophy (hoga), ed - ys ed ry es: ea is, nd ed s. ), .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / delta-pyrrolide- -carboxylate synthase, continuous ambulatory peritoneal dialysis, hyperprolinemia-type ii, short bwel syndrome (under arginine -free), argininosuccinic aciduria (asl), acute seizures, -hydroxyglutaric acidemia, -phosphoglycerate dehydrogenase deficiency dementia, dicarboxylic aminoaciduria, histinemia, hyperlysinemia i-family i, phosphoserine aminotransferase deficiency, short-bwel syndrome, and sotos syndrome are the most disease-associated metbaile we found here. . figure . pathway-associated metabolite and disease-associated metabolite analyses is, ria se i, he .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure . specify pathway analysis algorithms: over representation analysis : hypergeometric test pathway topology analysis : relative-betweeness centrality discussion to best of our knowledge, there are few studied on the role of amino acids in vitiligo. they are focus only on the one or two amino acids and their metabolites which are associated with the production pathway of melanin (phenylalanine, tyrosine and glucosamine, trimethylamine, cysteine, homocysteine and thiol). however, no studies have been conducted to investigate the profile of free amino acids, to investigate changes in those and metabolic pathways of vitiligo. amino acids play an important role in detoxification and immune responses through regulating the activation of t lymphocytes, b lymphocytes, natural killer cells, and macrophages ( ), cellular redox state, gene expression, and lymphocyte proliferation ( ), and the production of antibodies, cytokines, and other toxic compounds for the cell ( ). in most of the cell types, arginine is produced from citrulline as a precursor and is involved in regulating the activity of the immune system by producing nitric oxide. proline and glutamate synthesize ornithine by producing pyrroline- -carboxylate (p c). in addition, it is catabolized by proline oxidase in different organs to produce hydrogen peroxide and p c. by converting p c ric re he e, he o. ng ), of in ate by c .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / into proline, a reduction occurs in the ratio of nadp+ to p c reductase-dependent nadph. the proline-p c cycle regulates the cellular redox state and cell proliferation. in addition, ornithine is converted into citrulline and regenerates arginine using aspartate. given the metabolic pathways, in which arginine is involved, each of its products has a specific function, including ornithine, as a product of arginine, proline, and glutamate, which contributes to the production of glutamate, glutamine, and polyamines, and mitochondrial integrity, polyamines, as the products of arginine and methionine, affect gene expression, dna and protein production, ion channel activity, cell death, antioxidants, cellular activity, proliferation and differentiation of lymphocytes, and creatine, as a product of arginine, methionine, and glycine, has antioxidant, antiviral, and anti- tumor activity. therefore, concomitant decrease in arginine and ornithine and increase in proline may indicate impaired arginine and proline metabolism and urea cycle. as a result, there is a disruption in the response to oxidative stress and cell damage. there are several serine-pathways involving one-carbon metabolism, one of which is glycine synthesis. glycine is involved in synthesizing many important physiological molecules, including purine nucleotides, glutathione, and heme (a cofactor containing an iron atom). in addition, glycine itself is a potent antioxidant scavenging free radical. therefore, glycine is essential for the proliferation and antioxidative defense of leukocytes, and is an anti- inflammatory, immunomodulatory, and cytoprotective agent, the reduction of which indicates impaired glycine/serine and glutathione metabolism, which, in turn, disrupts cellular immunity and response to oxidative stress. ammonia is considered as an important source of nitrogen and a by-product of cellular metabolism. in addition, it is absorbed through reducing amine synthesis catalyzed by glutamine synthetase and glutamate dehydrogenase, the secondary reactions of which enable other amino acids such as glutamate, proline, and aspartate to obtain this nitrogen directly. glutamate regulates the expression of nitric oxide synthases (inos) in specific tissues and is indirectly involved in regulating the animal immune system. aspartate, acting as a precursor for nucleotide synthesis, contributes to various metabolic pathways and is important for lymphocyte proliferation. further, it is necessary for regenerating arginine produced from citrulline in active macrophages and maintaining the intracellular concentration of arginine to sustain no level in response to immune challenges. glutamate and aspartate play stimulating roles in the central and .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / peripheral nervous systems, affecting ionotropic and metabotropic receptors (peptide or polypeptide hormone receptors and neurotransmitters on the plasma membrane which play an important role in the immune system). they transport the reducing agents across the mitochondrial membrane, thereby regulating glycolysis and cellular redox state through the malate/aspartate shuttle. in addition, alanine, as a major substrate for hepatic glucose synthesis, is a significant energy substrate for leukocytes, thereby affecting immune function. �-alanine is the only non-essential beta amino acid which occurs naturally and is formed by various metabolic organs. additionally, they are involved in producing glutamate, aspartate, glutamine, and glycine in a part of their metabolic pathways. aspartate and glutamate, along with glutamine, are the main source of energy for enterocytes (intestinal epithelial cells). the results showed that glutamate increased among the patient group compared to the control group, indicating increased immune system activity and impaired cellular redox state. methionine is converted into homocysteine (used as a source of sulfur) in the course of its metabolism and cysteine is produced after homocysteine binds to serine and an intermediary cystathionine is formed. some studies examined homocysteine and thiols in vitiligo patients and found an increase in homocysteine due to essential cofactors and folate for the activity of methionine synthetase and, consequently a decrease in its activity and the methionine reproduction cycle, which led to an increase in cysteine. tyrosine is converted into dopaquinone, a highly intermediary metabolite, by tyrosinase which is important for regulating melanogenesis. dopaquinone reacts rapidly with cysteine as it increases to get involved in the production of pheomelanin, which is considered as a common type of melanin pigment found in the hair and skin, the color of which changes from yellow to red as its concentration increases. when the cysteine level does not decrease, the reaction does not lead to the production of eumelanin pigment, the increased concentration of which changes the color from light brown to black [ ]. by increasing thiol levels, the production of melanin is impaired. in addition, the dynamic thiol/disulfide homeostasis regulates the storage of antioxidants, detoxification, apoptosis, and many signal mechanisms including cell division and growth. the results indicated that an increase in cysteine and ratio of cysteine to ornithine and a decrease in the ratio of glycine, arginine, ornithine, and lysine to cysteine in the patient group. thus, impaired cysteine metabolism disrupts pigment production, increases the activity of the immune system, and .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / counteracts the effects of oxidative stress due to the deficiency in the production of antioxidant compounds such as taurine and glutathione, which result in damaging melanocytes and decreasing pigments. lysine, which reduced in the group of vitiligo patients, has multiple catabolic pathways, the main one of which is in the liver, where saccharopine, glutamate, alpha-aminoadipate - semialdehyde, and acetyl-coa are produced [ ]. in the human body, carnitine, involved in fatty acid metabolism, is biosynthesized using amino acids lysine and methionine. carnitine and its esters help reduce oxidative stress [ ]. in addition, dietary or extracellular lysine can modulate the entry of arginine into leukocytes and the production of no by inos through sharing the like transport systems with arginine. histidine is converted into urocanic acid through one of its metabolic pathways by enzymatic catalysis of histamine ammonia-lyase. uca is a unique photoreceptor and cis-uca is converted into trans-urocanic acid (trans-ucs) by absorbing ultraviolet (uv) radiation from the sun, which controls the activity of the immune system against the uv radiation from the sun. increased or decreased histidine level from the normal state disrupts the function of the skin immune system [ , ]. decreased histidine in the patients triggers the activity of the immune system in response to the existing stimuli, making their skin cells more vulnerable to uv radiation than the normal state. -methylhistidine is formed by the posttranslational methylation of histidine residues from major myofibrillar proteins (actin, and myosin). in humans, it is associated with a variety of diseases including type diabetes, eosinophilic esophagitis, and kidney disease. in addition, - methylhistidine is associated with the metabolic disorder of propionic acidemia. measuring - methylhistidine provides an indicator of the rate, at which muscle protein breaks down. it is also a biological marker for meat intake, muscle protein breakdown, and intestinal proteins. the clinical features of vitiligo are classified in different ways, one of which is based on the extent of the spots on the body surface. in the patients who were divided into two groups, with limited extent of spots (less than %) and large spots (greater than %), glutamate decreased by increasing spot extent. due to the role of glutamate in regulating the protein synthesis and breakdown in the cell and cell cycle, its lower level in these people can indicate impaired cell .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / metabolism and increased cell death, resulting in increased complications in people with more severe disease. given these cases, the meta-analytic recommendations for the diseases associated with impaired pathways are better understood such as ornithine transcarbamylase (otc) deficiency (an inherited disorder that causes ammonia to accumulate in the blood due to deficiency in the transcarbamylase), hyperornithinemia with gyrate atrophy (hoga) (an inherited disorder characterized by progressive vision loss). disruption of ornithine aminotransferase production helps convert ornithine into another molecule, called p c. p c can be converted into amino acids (glutamate and proline), delta-pyrrolide- -carboxylate synthase (difficulty in degrading proline to p c), continuous ambulatory peritoneal dialysis (difficulty in excreting all urea and ammonia and, therefore, the need for dialysis), hyperprolinemia-type ii (problems with proline degradation increase proline and p c), short bowel syndrome (under arginine-free) (the small intestine is required for arginine synthesis). therefore, limited access to essential amino acids in the patients with sbs leads to a defect in the intermediates of the urea cycle, ornithine, citrulline, and arginine, as well as a reduction in these amino acids, which may lead to hyperammonemia, argininosuccinic aciduria (asl), as a urea cycle disorder which causes ammonia to accumulate in the blood. other suggested disorders are all inherited diseases which cause complications and metabolic disorders. based on the results obtained from the review of data and results of previous studies in this area, it is observed that reduced melanin production due to increased cysteine in the patients as well as autoimmunity, and oxidative stress (increased glutamic acid and proline and decreased arginine, glycine, lysine, histidine, and ornithine in patients) simultaneously can damaging melanocytes, result in vitiliginous lesions on the skin surface of patients. thus, examining the proposed biomarkers may be helpful in early diagnose of at risk patients , in addition considering the changes in glutamic acid levels as biomarkers can be useful for determining the prognosis of the disease. also understanding the role of these biomarkers in vitiligo can provide the scientific basis for the development of novel therapeutic approaches in this disease. conclusion acknowledgments .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / amorini, a.m., lazzarino, g., di pietro, v., signoretti, s., lazzarino, g., belli, a., tavazzi, b., ( ). severity of experimental traumatic brain injury modulates changes in concentrations of cerebral free amino acids. journal of cellular and molecular medicine ( ), - . armstrong, a., ( ). advances in malignant melanoma: clinical and research perspectives. bod–books on demand. ding, x., du, j., zhang, j., ( ). the epidemiology and treatment of vitiligo: a chinese perspective. pigmentary disorders ( ), - . . douglas, c.a., ( ). amino acid analysis in wines by liquid chromatography: uv and fluorescence detection without sample enrichment. stellenbosch: stellenbosch university. fekkes, d., ( ). automated analysis of primary amino acids in plasma by high-performance liquid chromatography, amino acid analysis. springer, pp. - . gianfaldoni, s., tchernev, g., lotti, j., wollina, u., satolli, f., rovesti, m., frança, k., lotti, t., ( ). unconventional treatments for vitiligo: are they (un) satisfactory? open access macedonian journal of medical sciences ( ), . grimes, p., miller, m., ( ). vitiligo: patient stories, self-esteem, and the psychological burden of disease. international journal of women's dermatology ( ), - . hamidizadeh, n., ranjbar, s., ghanizadeh, a., parvizi, m.m., jafari, p., handjani, f., ( ). evaluating prevalence of depression, anxiety and hopelessness in patients with vitiligo on an iranian population. health and quality of life outcomes ( ), . jakku, r., thappatla, v., kola, t., kadarla, r.k., ( ). vitiligo-an overview. asian journal of pharmaceutical research and development ( ), - . liang, l., li, y., tian, x., zhou, j., zhong, l., ( ). comprehensive lipidomic, metabolomic and proteomic profiling reveals the role of immune system in vitiligo. clinical and experimental dermatology ( ), e -e . sahoo, a., lee, b., boniface, k., seneschal, j., sahoo, s.k., seki, t., wang, c., das, s., han, x., steppie, m., ( ). microrna- regulates oxidative phosphorylation and energy metabolism in human vitiligo. journal of investigative dermatology ( ), - . singh, r.k., lee, k.m., vujkovic-cvijin, i., ucmak, d., farahnik, b., abrouk, m., nakamura, m., zhu, t.h., bhutani, t., wei, m., ( ). the role of il- in vitiligo: a review. autoimmunity reviews ( ), - . speeckaert, r., speeckaert, m., de schepper, s., van geel, n., ( ). biomarkers of disease activity in vitiligo: a systematic review. autoimmunity reviews ( ), - . wu, j.-l., yu, s.-y., wu, s.-h., bao, a.-m., ( ). a sensitive and practical rp-hplc-fld for determination of the low neuroactive amino acid levels in body fluids and its application in depression. neuroscience letters , - . zheleva, a., nikolova, g., karamalakova, y., hristakieva, e., lavcheva, r., gadjeva, v., ( ). comparative study on some oxidative stress parameters in blood of vitiligo patients before and after combined therapy. regulatory toxicology and pharmacology , - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / distinct roles and actions of pdi family enzymes in catalysis of nascent-chain disulfide formation distinct roles and actions of pdi family enzymes in catalysis of nascent-chain disulfide formation chihiro hirayama , kodai machida # , kentaro noi # , tadayoshi murakawa , masaki okumura , , teru ogura , , hiroaki imataka , and kenji inaba * institute of multidisciplinary research for advanced materials, tohoku university, sendai, miyagi - , japan graduate school of engineering, university of hyogo, himeji, hyogo - , japan institute for nanoscience design, osaka university, toyonaka, osaka - , japan graduate school of life science and technology, tokyo institute of technology, yokohama, kanagawa, - , japan frontier research institute for interdisciplinary sciences, tohoku university, sendai, miyagi - , japan institute of molecular embryology and genetics, kumamoto university, kumamoto, kumamoto - , japan faculty of life sciences, kumamoto university, kumamoto - , japan # these authors contributed equally to this work *correspondence & lead contact: kenji inaba, institute of multidisciplinary research for advanced materials, tohoku university, katahira - - , aoba-ku, sendai, miyagi - , japan e-mail: kenji.inaba.a @tohoku.ac.jp tel: + - - - fax: + - - - orcid: - - - running title: nascent-chain disulfide bond formation (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract the mammalian endoplasmic reticulum (er) harbors more than members of the protein disulfide isomerase (pdi) family that act to maintain proteostasis. herein, we developed an in vitro system for directly monitoring pdi- or erp -catalyzed disulfide bond formation in ribosome-associated nascent chains (rnc) of human serum albumin. the results indicated that erp more efficiently introduced disulfide bonds into nascent chains with short segments exposed outside the ribosome exit site than pdi. single-molecule analysis by high-speed atomic force microscopy further revealed that pdi binds nascent chains persistently, forming a stable face-to-face homodimer, whereas erp binds for a shorter time in monomeric form, indicating their different mechanisms for substrate recognition and disulfide bond introduction. similarly to erp , a pdi mutant with an occluded substrate-binding pocket displayed shorter-time rnc binding and higher efficiency in disulfide introduction than wild-type pdi. altogether, erp serves as a more potent disulfide introducer especially during the early stages of translation, whereas pdi can catalyze disulfide formation in rnc when longer nascent chains emerge out from ribosome. keywords nascent chain, protein disulfide isomerase, erp , disulfide bond, co-translational folding, high-speed atomic force microscopy, er proteostasis (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction over billions of years of evolution, living organisms have developed ingenious mechanisms to promote protein folding (hartl et al, ). the oxidative network catalyzing protein disulfide bond formation in the endoplasmic reticulum (er) is a prime example. while canonical protein disulfide isomerase (pdi) and er oxidoreductin- (ero ) were previously postulated to constitute a primary disulfide bond formation pathway (araki & inaba, ; mezghrani et al, ; tavender & bulleid, ), more than different pdi family enzymes and multiple pdi oxidases besides ero have recently been identified in the mammalian er, suggesting the development of highly diverse oxidative networks in higher eukaryotes (nguyen et al, ; schulman et al, ; tavender et al, ). each pdi family enzyme is likely to play a distinct role in catalyzing the oxidative folding of different substrates, concomitant with some functional redundancy, leading to the efficient production of a wide variety of secretory proteins with multiple disulfide bonds (bulleid & ellgaard, ; okumura et al, ; sato & inaba, ). our previous in vitro studies using model substrates such as reduced and denatured bovine pancreatic trypsin inhibitor (bpti) and ribonuclease a (rnase a) demonstrated that different pdi family enzymes participate in different stages of oxidative protein folding, resulting in the accelerated folding of native enzymes (kojima et al, ; sato et al, ). multiple pdi family enzymes cooperate to synergistically increase the speed and fidelity of disulfide bond formation in substrate proteins. however, whether mechanistic insights gained by in vitro experiments using full-length substrates are applicable to real events of oxidative folding in the er remains an important question. indeed, some previous works demonstrated that newly synthesized (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . polypeptide chains undergo disulfide bond formation and isomerization co-translationally, presumably via catalysis by specific pdi family members (kadokura et al, ; molinari & helenius, ; robinson & bulleid, ; robinson et al, ; robinson et al, ). furthermore, nascent chains play important roles in their own quality control by modulating the translation speed to increase the yield of native folding; if a nascent chain fails to fold or complete translation, then the resultant aberrant ribosome-nascent chain complexes are degraded or destabilized (buhr et al, ; chadani et al, ; matsuo et al, ). these observations suggest that understanding real events of oxidative protein folding in cells requires systematic analysis of how pdi family enzymes act on nascent polypeptide chains during synthesis by ribosomes. to this end, we herein developed an experimental system for directly monitoring disulfide bond formation in ribosome-associated human serum albumin (hsa) nascent chains of different lengths from the n-terminus. the resultant ribosome-nascent chain complexes (rncs) were reacted with two ubiquitously expressed pdi family members, er-resident protein (erp ) and canonical pdi. these two enzymes were previously shown to have distinct roles in catalyzing oxidative protein folding: erp engages in rapid but promiscuous disulfide bond introduction during the early stages of folding, while pdi serves as an effective proofreader of non-native disulfides during the later stages (kojima et al., ; sato et al., ). the subsequent maleimidyl polyethylene glycol (mal-peg) modification of free cysteines and bis-tris (ph . ) page analysis enabled us to detect the oxidation status of the hsa nascent chains conjugated with transfer rna (trna). using high-speed atomic force microscopy (hs-afm), we further visualized pdi and erp acting on the rncs (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . at the single-molecule level. collectively, the results indicated that although both erp and pdi could introduce a disulfide bond into the ribosome-associated hsa nascent chains, they demanded different lengths of the hsa segment exposed outside the ribosome exit site, and displayed different mechanisms of action against the rnc. the present systematic in vitro study using rnc containing different lengths of hsa nascent chains mimics co-translational disulfide bond formation in the er, and the results provide a framework for understanding the mechanistic basis of oxidative nascent-chain folding catalyzed by pdi family enzymes. results the efficiency of disulfide bond introduction into hsa nascent chains by pdi/erp to investigate whether pdi family enzymes can introduce disulfide bonds into a substrate during translation, we first prepared rncs in vitro. for this purpose, we made use of a cell-free protein translation system reconstituted with eukaryotic elongation factors and , eukaryotic release factors and (erf and erf ), aminoacyl-trna synthetases, trnas, and ribosome subunits, developed previously by imataka and colleagues (machida et al, ). hsa was chosen as a model substrate for the following reasons. firstly, the three-dimensional structure of hsa has been solved at high resolution (sugio et al, ), providing information on the exact location of disulfide bonds in its native structure. secondly, native-state hsa contains an unpaired cysteine, cys , near the n-terminal region, which has potential to form a non-native disulfide bond with one of the subsequent cysteines, serving as a good indicator of whether a non-native disulfide is introduced by erp or pdi during the early stage of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . translation. thirdly, overall conformation and kinetics of disulfide bond regeneration were characterized for reduced full-length hsa (lee & hirose, ), which is beneficial for discussing similarities and differences in post- and co-translational oxidative folding. forth, no n-glycosylation sites are contained in the first amino acids of hsa, implying that hsa nascent chains synthesized by the cell-free system are equivalent to those synthesized in the er in regard to n-glycosylation. finally, the involvement of pdi family enzymes in intracellular hsa folding has been demonstrated (koritzinsky et al, ; rutkevich et al, ; rutkevich & williams, ), ensuring the physiological relevance of the present study. to stall the translation of hsa at specified sites, a uorf arrest sequence (alderete et al, ) was inserted into appropriate sites of the expression plasmid (fig a). we first prepared two versions of the rnc containing different lengths of hsa nascent chains: rnc -aa and rnc -aa. since the ribosome exit tunnel accommodates a polypeptide chain of ~ amino acid (aa) residues (zhang et al, ), the n-terminal residues of hsa (excluding the n-terminal -aa pro-sequence) are predicted to be exposed outside the ribosome exit tunnel in rnc -aa, including cys and cys (fig a). in the rnc -aa construct, the n-terminal residues of hsa, including cys as well as cys /cys , are predicted to emerge from the ribosome (fig a). notably, cys and cys form a native disulfide bond, whereas cys is unpaired in the native structure of hsa domain i. when rnc -aa was employed as a substrate, neither pdi nor erp could efficiently introduce a disulfide bond into the nascent chain (fig c and d). however, both enzymes introduced a disulfide bond into rnc -aa with higher efficiency than into rnc -aa (fig e and f), suggesting that the length of the exposed hsa (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . segment or the distance of a pair of cysteines from the ribosome exit site is critical for disulfide bond introduction by pdi and erp . for either construct, a faint band was seen between the bands of ‘no ss’ and ‘ ss’, and this band was even fainter without gsh/gssg (the second lane from the left) and had a tendency to get stronger at late time points. presumably, this band represents a species in which one of free cysteines is glutathionylated, and the species increased gradually in the course of the reaction. of note, erp introduced a disulfide bond into rnc -aa at a much higher rate than pdi, indicating that erp serves as a more competent disulfide bond introducer to rncs than pdi (fig f). the remarkable difference in disulfide bond introduction efficiency by these two enzymes seems unlikely to be explained simply by the different number of redox-active trx-like domains in pdi (two) and erp (three) (fig b). also, the redox states in the presence of mm gsh and . mm gssg are similar between these two enzymes (fig ev a and ev b), suggesting their comparable redox potentials. thus, the different ability of erp and pdi to introduce a disulfide into -aa is likely caused by other factors such as different structural features and different mechanism of substrate recognition, as discussed below. next, to identify which cysteine pair forms a disulfide bond in rnc -aa, we constructed three cysteine mutants in which either cys , cys , or cys was mutated to alanine (fig a). the assays using the mutants showed that whereas pdi was unable to introduce a disulfide bond into rnc -aa c a and c a (fig b, top and middle), the enzyme introduced a cys -cys non-native disulfide bond into rnc -aa c a (fig b, bottom), at almost the same rate as the generation of the ‘ ss’ species in -aa (fig e and f). pdi could not introduce a cys -cys native disulfide bond, presumably because this cysteine pair is located too close to the ribosome exit site (see (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . also fig b and c). conversely, the slow but possible formation of a cys -cys non-native disulfide in aa by pdi suggests that the distance between a cysteine pair of interest and the ribosome exit site is key to allowing the enzyme to catalyze disulfide bond introduction into rncs. considering the different locations of the cys -cys and cys -cys pairs on rnc -aa, a distance of ~ residues from the ribosome exit site appears to be necessary for the pdi-catalyzed reaction (see also the discussion). in contrast to pdi, erp could introduce a native disulfide bond into rnc -aa c a (fig c, top). like pdi, erp also introduced a non-native disulfide bond between cys and cys into rnc -aa c a, but its efficiency was lower than that of a cys -cys native disulfide (fig c, bottom). no disulfide bond was formed between cys and cys by either erp or pdi (fig c, middle), presumably due to the considerable spatial separation of these two cysteines. based on these results, we concluded that for efficient disulfide bond introduction into rncs, erp requires an intermediary polypeptide segment with a shorter distance between a cysteine pair of interest and the ribosome exit site than pdi. we here note that erp -catalyzed generation of the ‘ ss’ species was faster in -aa than in -aa c a (fig f and c). this observation may suggest the occurrence of cys -mediated disulfide bond formation in -aa, namely, the formation of a cys -cys non-native disulfide and, possibly, its rapid isomerization to a cys -cys native disulfide. accessibility of pdi/erp to cysteines on the ribosome-hsa nascent chain complex to examine the accessibility of pdi and erp to cys residues on rnc -aa, we (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . constructed three rnc -aa mono-cys mutants in which either cys , cys , or cys on the hsa nascent chain was retained, and investigated whether a mixed disulfide could be formed between the rnc -aa mutant and a trapping mutant of pdi or erp in which all cxxc redox-active sites were mutated to cxxa. both pdi and erp formed a mixed disulfide bond with cys and cys on rnc -aa with high probability, but covalent linkages to cys were marginal (fig d and e). the results suggest that the redox-active sites of pdi and erp could gain access to cys and cys , but to a much lesser extent, to cys , probably due to steric collision with the ribosome. nevertheless, erp efficiently introduced a native disulfide bond between cys and cys (fig c, top), presumably because erp first attacked cys on the hsa nascent chain, and the resultant mixed disulfide was subjected to nucleophilic attack by cys (fig f, right). by contrast, the mixed disulfide between pdi and cys on the hsa nascent chain seems unlikely to be attacked by cys , probably due to steric collision between pdi and the ribosome (fig f, left). in line with this idea, pdi adopts a u-like overall conformation with restricted movements of four thioredoxin (trx)-like domains (tian et al, ; wang et al, ), whereas erp forms a highly flexible v-shape conformation composed of three trx-like domains and two long (~ aa) interdomain linkers (kojima et al., ). correlations between cysteine accessibility and the efficiency of disulfide bond introduction by pdi/erp based on the results presented above, we believe that the distance between cysteines of interest and the ribosome exit site is critical for efficient disulfide introduction by pdi and erp . to test this hypothesis, we increased the distance of the cys -cys pair (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . from the ribosome exit site by inserting an extended polypeptide segment composed of [sg] or [sg] repeat immediately after cys on rnc -aa c a (fig a), and investigated the effects of the insertions on the efficiency of disulfide bond formation. while pdi was unable to introduce a cys -cys native disulfide into rnc -aa c a (fig b, top), insertion of a [sg] repeat allowed this reaction, and nearly % of -aa c a was disulfide-bonded within a reaction time of s (fig b, upper and c). the insertion of a longer repeat [sg] further promoted disulfide bond formation (fig b, lower and c). a similar enhancement following [sg] repeat insertion was observed for erp -catalyzed reactions. however, erp exhibited a striking difference from pdi: insertion of a [sg] repeat was long enough to introduce a cys -cys native disulfide into rnc -aa c a within s, and insertion of a [sg] repeat gave only a small additional enhancement (fig d and e). thus, the presence of a disordered or extended segment of ~ aa (asp phe + [sg] repeat) between a cysteine pair of interest and the ribosome exit site was necessary and sufficient for erp to generate a cys -cys disulfide rapidly, whereas pdi required a longer segment of ~ aa (asp phe + [sg] repeat) in this intermediary region for efficient introduction of a cys -cys disulfide. thus, erp seems to be more capable of introducing a disulfide bond near the ribosome exit site than pdi. in other words, erp likely has the higher potential to introduce a disulfide bond into the hsa nascent chain during the earlier stages of translation than pdi. to verify that cys -cys disulfide formation facilitated by [sg] repeat insertion was ascribed to higher accessibility of pdi/erp to cys , we again investigated mixed disulfide bond formation between trapping mutants of pdi/erp (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . and each cysteine on rnc -aa following [sg] repeat insertion. both pdi and erp formed a mixed disulfide with all cysteines including cys (fig f and g), indicating that there is a correlation between the accessibility of pdi/erp to a target pair of cysteines and the efficiency of disulfide bond introduction by the enzymes. disulfide bond introduction into a longer hsa nascent chain by pdi/erp in addition to the [sg]-repeat insertion, we examined the effect of natural hsa sequence extension on pdi- or erp -mediated disulfide formation. for this purpose, we prepared rnc -aa in which the n-terminal amino acids of hsa (excluding the n-terminal -aa pro-sequence), including cys , cys , cys , and cys , are predicted to emerge from ribosome (fig a). with this construct, however, we had a technical problem with detection of the reduced species, because mal-peg modification of four cysteines greatly diminished the gel-to-membrane transfer efficiency. we overcame this problem by using photo-cleavable mal-peg (peg-pcmal) and irradiating uv light to the sds gel after the gel electrophoresis and before the membrane transfer. consequently, we observed both pdi and erp introduced a disulfide bond into -aa (fig b), but the efficiency was slower than that into -aa (fig e and f), although a longer polypeptide chain is exposed outside the ribosome exit site in rnc -aa. thus, the effect of natural sequence extension was opposite to that of [sg]-repeat insertion. formation of some higher-order structure or exposure of another cysteine may somehow prevent pdi and erp from introducing a disulfide bond into rnc -aa. thus, a longer polypeptide chain exposed outside ribosome does not always lead to a higher disulfide formation rate. rather, it is suggested that pdi and erp can (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduce a disulfide bond into a nascent chain with higher efficiency when the necessary and minimum length emerges out. given that four cysteines are exposed outside the ribosome in rnc -aa, we next investigated whether pdi and erp can catalyze nascent-chain disulfide formation additionally or synergistically. the mixture of pdi and erp generated a ‘ ss’ species, but not a ‘ ss’ species, like pdi or erp alone (fig b and c). notably, the presence of pdi inhibited erp -mediated disulfide formation, possibly due to its competition with erp for binding to rnc -aa. thus, neither additional nor synergistic effect was observed (fig b and c). in this regard, our previous observation for the synergistic cooperation of pdi and erp in rnase a oxidative folding (sato et al., ) was not true for the ribosome-associated hsa nascent chain. single-molecule analysis of erp by high-speed atomic force microscopy to explore the mechanisms by which pdi and erp recognize and act on rncs at the molecular level, we employed hs-afm (kodera et al, ; noi et al, ; okumura et al, ; uchihashi et al, ). while our previous hs-afm analysis revealed that pdi molecules form homodimers in the presence of unfolded substrates (okumura et al., ), the structure and dynamics of erp have not been analyzed using this experimental approach. therefore, we first observed erp molecules alone by immobilizing the n-terminal his-tag on a co + -coated mica surface. afm images revealed various overall shapes of erp (fig a), and some particle images clearly demonstrated the presence of three thioredoxin (trx)-like domains in erp (fig a, left). to assess the overall structures of erp , we calculated the circularity of each molecule and performed statistical analysis (uchihashi et al., ). circularity is a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . measure of how circular the outline of an observed molecule is, defined by the equation s/l , where l and s are the contour length of the outline and the area surrounded by the outline, respectively. thus, a circularity of . indicates a perfect circle, and values < indicate a more extended conformation. statistical analysis based on circularity classified randomly chosen erp particles into two major groups: opened v-shape and round/compact o-shape (fig a). histograms with gaussian fitting curves indicated that ~ % of erp molecules adopted v-shape conformations while ~ % adopted o-shape conformations (fig b). there was no large difference in height between these two conformations, suggesting that the three trx-like domains of erp are arranged within the same plane in either conformation. successive afm images acquired every ms revealed that erp adopted an open v-shape conformation during nearly % of the observation time, while the protein also adopted an o-shape conformation occasionally (fig c, d, e and movie ev ). the histogram calculated from the time-course snapshots was similar to that calculated from images of molecules at a certain timepoint (fig b and e). importantly, structural insights gained by hs-afm analysis are in good agreement with those from small-angle x-ray scattering (saxs) analysis: both analyses consistently indicate the coexistence of a major population of molecules with an open v-shape and a minor population with a compact o-shape (kojima et al., ). single-molecule analysis of pdi/erp acting on -aa rnc by hs-afm pdi and erp are predicted to bind rncs transiently during disulfide bond introduction, but transient interactions would make it harder to observe and analyze the mode of pdi/erp binding to rncs. more practically, at least mins are required to (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . prepare for starting hs-afm measurements after adding pdi or erp to rncs immobilized onto a mica surface. if we employed rncs containing natural hsa sequences, pdi or erp would complete nascent-chain disulfide formation during this setup time. we therefore constructed hsa -aa rnc with cys , cys , and cys mutated to ala (hereafter referred to as -aa ca rnc), with the intension of trapping rnc molecules bound to pdi/erp . after testing several rnc immobilization methods, we chose to immobilize rnc on a ni + -coated mica surface. as a result, most rnc molecules were observed to lie sideways on the mica surface, while nascent chains were difficult to visualize, probably due to their flexible and extended structural nature (fig a). when oxidized pdi or erp were added to onto the rnc-immobilized mica surface, pdi/erp -like particles were observed in the peripheral region of ribosomes. when no-chain rnc (nc-rnc), comprising only the n-terminal flag tag and the subsequent uorf but no segment from hsa, was immobilized on the mica surface, far fewer particles were observed near rncs (within Å from the outline of ribosomes) by hs-afm despite the presence of pdi/erp (fig ev a and ev b). these results confirm that we successfully observed pdi/erp molecules acting on hsa nascent chains associated with ribosomes. notably, the hs-afm analysis revealed that pdi bound rncs in both monomeric and dimeric forms at an approximate ratio of : (fig b), as reported previously for reduced and denatured bpti and rnase a as substrates (okumura et al., ). thus, pdi likely recognizes hsa nascent chains in a similar manner to full-length substrates. statistical analysis of rnc binding rates revealed that whereas most monomeric pdi molecules ( / molecules) bound rnc for s or shorter (fig (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . d, fig ev a and movie ev ), most homodimeric pdi molecules ( / molecules) bound rnc for s or longer (fig d, fig ev b and movie ev ). by contrast, erp molecules in the periphery of rncs were only present in monomeric form (fig c). importantly, nearly % ( / molecules) of erp molecules bound rnc for to s (fig d, fig ev c and movie ev ), while a smaller portion ( / molecules) bound rnc for ~ s (fig d). it is also notable that significant portion of pdi and erp molecules bound ribosomes for < s. this may indicate that pdi/erp binds or approaches rncs only transiently possibly via diffusion, without tight interactions. the histogram of the distance between the edge of ribosomes and the center of ribosome-neighboring pdi/erp molecules indicated that both pdi and erp bound rncs at positions ~ nm distant from ribosomes with a single-gaussian distribution with a half width of ~ nm (fig e), suggesting that both enzymes recognize similar sites of the hsa nascent chain. given that the distance between adjacent amino acids is approximately . Å along an extended strand, cys , cys , and cys are calculated to be Å, Å, and Å distant from the ribosome exit site, respectively. the distributions of pdi and erp molecules bound to rnc -aa seem consistent with their accessibility to cys and cys , but not to cys , as revealed by their mixed disulfide formation with rnc -aa (fig d and e). role of the pdi hydrophobic pocket in oxidation of the hsa nascent chain it is widely known that the pdi b’ domain contains a hydrophobic pocket that acts as a primary substrate-binding site (klappa et al, ). to examine the involvement of the hydrophobic pocket in pdi-catalyzed disulfide bond formation in the hsa nascent chain, we mutated i , one of the central residues that constitute the hydrophobic (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pocket, to ala, and compared the efficiency of disulfide bond introduction into rnc -aa between wild-type (wt) and mutant i a proteins. in this mutant, the x-linker flanked by b’ and a’ domains tightly binds the hydrophobic pocket, unlike in wt, thereby preventing pdi from tightly binding an unfolded substrate (bekendam et al, ; nguyen et al, ). erp , another primary member of the pdi family, has a u-shape domain arrangement similar to pdi, but does not contain the hydrophobic pocket in the b’ domain. for comparison, we also monitored erp -catalyzed disulfide introduction into rnc -aa. despite the occlusion or lack of the hydrophobic substrate-binding pocket, both pdi i a and erp were found to introduce a disulfide bond into rnc -aa at a higher rate than pdi wt (fig a and b). this result suggests that the hydrophobic pocket is involved in binding the hsa nascent chain, but this binding appears to rather slow down disulfide introduction into a nascent chain. to further explore the mechanism by which pdi i a introduced a disulfide bond at a faster rate than pdi wt, we analyzed its binding to rnc using hs-afm. the analysis revealed that, while nearly one-third of pdi i a molecules formed dimers in the presence of rnc -aa like pdi wt, the mutant dimers bound rnc for a shorter time than the wt dimers (fig c and movie ev ). thus, the rnc-binding time of pdi i a showed similar distribution to that of erp (fig d and movies ev and ev ), which seems consistent with the higher disulfide introduction efficiency of pdi i a than that of pdi wt. pdi i a also bound rncs at positions ~ nm distant from ribosome with a single-gaussian distribution (fig e), suggesting that pdi i a recognizes similar sites of the hsa nascent chain as pdi and erp . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . discussion a number of studies have recently investigated co-translational oxidative folding in the er (kadokura et al., ; robinson et al., ; robinson et al., ). the present study showed that while both pdi and erp can introduce a disulfide bond into a nascent chain co-translationally, erp catalyzes this reaction more efficiently than pdi and requires a shorter nascent chain segment exposed outside the ribosome exit. thus, erp appears to be capable of introducing a disulfide bond into a nascent chain during the earlier stages of translation than pdi. the efficient introduction of a cys -cys native disulfide on rnc -aa by erp (fig ) suggests that a separation of ~ aa residues between a c-terminal cysteine on a nascent chain and the ribosome exit site (i.e., residues - ) is sufficient for erp to catalyze this reaction (fig ). when a nascent chain was elongated by the insertion of [sg]-repeat sequences, pdi could also introduce the native disulfide bond into rncs to some extent (fig b and c). thus, pdi appears to act on a nascent chain to introduce a disulfide bond when the distance between a c-terminal cysteine on a nascent chain and the ribosome exit site reaches ~ aa residues (i.e., residues - + [sg] repeat; fig ). disulfide bond formation in partially er-exposed nascent chains was indeed observed with the adam disintegrin domain, which has a dense disulfide bonding pattern and little defined structure (robinson et al., ). thus, disulfide bond formation seems to be allowed before the higher order structure is defined in a nascent chain. this could be the case with a cys -cys nonnative disulfide and a cys -cys native disulfide on rnc -aa, since the n-terminal -residue hsa fragment alone is unlikely to fold to a globular native-like structure though the fragment of residue to is predicted to form an -helix according to the hsa native structure. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . in contrast, some proteins including  -microglobulin ( m) and prolactin are shown to form disulfide bonds only after a folding domain is fully exposed to the er or a polypeptide chain is released from ribosome, suggesting their folding-driven disulfide bond formation. notably, pdi binds  m when the n-terminal ~ residues of  m are exposed to the er, and completes disulfide bond introduction at the even later stages of translation (robinson et al., ). thus, pdi has been demonstrated to engage in disulfide bond formation during late stages of translation or after translation in the er. regarding mechanistic insight, the present hs-afm analysis visualized pdi and erp acting on nascent chains at the single-molecule level. we found that pdi forms a face-to-face homodimer that binds a nascent chain, as is the case with reduced and denatured full-length substrates (okumura et al., ). on the other hand, erp maintains a monomeric form while binding a nascent chain. interestingly, the pdi dimer binds a nascent chain much more persistently than the pdi monomer and erp , suggesting that the pdi dimer holds a nascent chain tightly inside its central hydrophobic cavity. in agreement with this observation, a hydrophobic-pocket mutant (i a) of pdi bound a nascent chain for shorter time and introduced a disulfide bond into a nascent chain more rapidly than the wt enzyme, as was the case with erp . in this context, pdi competed with erp for acting on rnc -aa, and thereby inhibited erp -mediated disulfide introduction (fig and fig ). thus, pdi family enzymes do not always work synergistically to accelerate oxidative protein folding, but may possibly inhibit each other during co-translational disulfide bond formation. how the er membrane translocon channel is involved in co-translational oxidative folding catalyzed by pdi family enzymes remains an important question. it is possible that pdi and erp form a supramolecular complex with ribosomes and the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . sec translocon channel via a nascent chain. indeed, pdi was previously identified as a luminal protein that was in close contact with translocating nascent chains (klappa et al, ). additionally, the oligosaccharyltransferase complex (harada et al, ) and an er chaperone calnexin (farmery et al, ) have been reported to interact with the ribosome-associated sec channel to catalyze n-glycosylation and folding of nascent chains in the er, respectively. in this regard, it will be interesting to examine the close co-localization of pdi/erp with the sec channel in the presence or absence of nascent chains in transit into the er lumen by super-resolution microscopy or other tools. systematic studies with a wider range of substrates of different lengths from the ribosome exit site and different numbers of cysteine pairs, and with other pdi family members potentially having different functional roles, will provide further mechanistic and physiological insights into co-translational oxidative folding and protein quality control in the er. materials & methods construction of hsa plasmids dna fragments encoding specific regions ( -aa, n-terminal pro-sequence -aa + the subsequent -aa; -aa, n-terminal pro-sequence -aa + the subsequent -aa; -aa, n-terminal pro-sequence -aa + the subsequent -aa) of hsa were amplified by pcr with appropriate primers and inserted into the puc-t -hcv-flag- a-uorf expression plasmid, as described in machida et al. ( ). the amplified fragments were replaced with the a region to generate puc-t -hcv-flag-hsa ( -aa or -aa)-uorf . rnc -aa c a/c a/c a and mono-cys mutants were constructed using the quikchange method with appropriate primers (table ). rnc -aa c a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . with [sg] or [sg] repeats were constructed by the prime star max (takara bio inc., japan) method using appropriate primers (table ). expression and purification of pdi and erp overexpression and purification of human pdi and erp , and their mutants, were performed as described previously (kojima et al., ; sato et al., ). an erp trapping mutant with a cxxa sequence in all trx-like domains was constructed by the quikchange method using appropriate sets of primers. preparation of rncs using a translation system reconstituted with human factors a cell-free translation system was reconstituted with eef ( m), eef ( m), erf / ( . m), aminoacyl-trna synthetases ( . g/l), trnas ( g/l), s ribosomal subunit ( . m), s ribosomal subunit ( . m), ppa ( . m), amino acids mixture ( . mm) and t rna polymerase ( . g/l) (machida et al., ). we added . µl template plasmid ( . mg/ml) into µl of this cell-free system, and the mixture was incubated for at least  . h at c. after hkms buffer (comprising mm hepes-koh (ph . ), mm kcl, mm mg(oac) , and . m sucrose) was added, samples were ultra-centrifuged at , g overnight at c to recover the rnc as a pellet. after removing the supernatant, pellets were resuspended in hkm buffer comprising mm hepes-koh (ph . ), mm kcl, and mm mg(oac) . monitoring pdi- and erp -mediated disulfide bond introduction into rncs (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the rnc suspension prepared as described above was mixed with pdi or erp ( . m each) and glutathione/oxidized glutathione (gsh/gssg; . mm: . mm; nacalai tesque, inc., japan). aliquots were collected after incubation at c for the indicated times, and reactions were quenched with mal-peg k ( mm; nof corporation, japan) for rnc -aa and rnc -aa. after cysteine alkylation at room temperature for min, samples were separated by % bis-tris (ph . ) page (thermo fisher scientific k.k., japan) in the presence of the reducing reagent -mercaptoethanol -me; % v/v; nacalai tesque, inc., japan). after transferring onto a polyvinylidene fluoride (pvdf) membrane (merck kgaa, darmstadt, germany), bands on the membrane were visualized using chemi-lumi one ultra (nacalai tesque, inc., japan) and a chemidoctm imaging system (bio-rad laboratories, inc., ca, usa). signal intensity was quantified using imagelab software (bio-rad laboratories, inc., ca, usa). for rnc -aa, reactions were quenched with peg-pcmal (dojindo, japan). after cysteine alkylation at room temperature for min, samples were separated by % bis-tris (ph . ) page (thermo fisher scientific k.k., japan) in the presence of the reducing reagent -me % v/v;). after gel electrophoresis, the gel was subjected to uv irradiation ( nm, w) for min. the subsequent procedures were the same as described above. monitoring intermolecular disulfide bond linkage between pdi/erp and ribosome-hsa nascent chain complexes to detect the intermolecular disulfide bond linkage between pdi/erp and the ribosome-hsa nascent chain complex, we employed rnc -aa mono-cys mutants (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . retaining one of cys , cys , or cys . the rnc suspension prepared as described above was mixed with a pdi or erp trapping mutant ( m each) and diamide ( µm). aliquots were collected after incubation at c for min, and reactions were quenched with n-ethylmaleimide ( mm; nacalai tesque, inc., japan). samples were analyzed by nu-page and western blotting as described above. high-speed atomic force microscopy imaging the structural dynamics of pdi and erp were probed using a high-speed afm instrument developed by toshio ando’s group (kanazawa university). data acquisition for erp was performed as described previously (okumura et al., ). briefly, his -tagged erp was immobilized on a co + -coated mica surface through the n-terminal his-tag. to this end, a droplet ( l) containing nm erp was loaded onto the mica surface. after a min incubation, the surface was washed with tris buffer ( mm tris-hcl ph . , mm nacl). single-molecule imaging was performed in tapping mode (spring constant, ~ . n/m; resonant frequency, . – mhz; quality factor in water, ~ ) and analyzed using kodec . . . software developed by toshio ando’s group (kanazawa university). afm observations were made in fixed imaging areas ( × Å ) at a scan rate of . s/frame. each molecule was observed separately on a single frame with the highest pixel setting ( × pixels). cantilevers (olympus, tokyo, japan) were – m long, m wide, and nm thick. for afm imaging, the free oscillation amplitude was set to ~ nm, and the set-point amplitude was around % of the free oscillation amplitude. the estimated tapping force was < pn. a low-pass filter was used to remove noise from acquired images. the area of a single erp molecule in each frame was calculated using labview (national (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . instruments, austin, tx, usa) with custom-made programs. to observe the binding of pdi/erp to rncs by hs-afm, rncs were immobilized on a ni + -coated mica surface via electrostatic interactions. to this end, a droplet ( l) containing rncs was loaded onto the mica surface. after a min incubation, the surface was washed with hsa buffer comprising mm hepes-koh ph . , mm kcl, and mm mg(oac) . pdi/erp lacking the n-terminal his -tag was added to the rnc-immobilized mica surface at a final concentration of nm. measurements were performed under the same conditions described above. acknowledgments this work was supported by grants-in-aid for scientific research from mext to ki ( and h ), the nagase science technology foundation (k.i.) and the mitsubishi foundation (k.i.). this work was also supported by grant-in-aid for jsps fellows (grant number j to c.h.) and a grant-in-aid of tohoku university, division for interdisciplinary advanced research and education (to c.h.). author contributions c.h. and t.m. developed an experimental system for directly monitoring co-translational disulfide bond formation. k.m. and h.i. developed and prepared cell-free protein translation system reconstituted with human factors. c.h. prepared various plasmids. c.h. and m.o. purified pdi and erp , and their mutants. c.h. and k.n. performed hs-afm measurements and analyses. c.h., k.n., m.o. and t.o. discussed the results of hs-afm. k.i. supervised the work. c.h. and k.n. prepared the figures. c.h. and k.i. wrote the manuscript. all of the authors discussed the results and (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . approved the manuscript. conflict of interests we declare that there are no competing interests related to this work. references alderete jp, jarrahian s, geballe ap ( ) translational effects of mutations and polymorphisms in a repressive upstream open reading frame of the human cytomegalovirus ul gene. j virol : - araki k, inaba k ( ) structure, mechanism, and evolution of ero family enzymes. antioxidants & redox signaling : - bekendam rh, bendapudi pk, lin l, nag pp, pu j, kennedy dr, feldenzer a, chiu j, cook km, furie b et al ( ) a substrate-driven allosteric switch that enhances pdi catalytic activity. nature communications : buhr f, jha s, thommen m, mittelstaet j, kutz f, schwalbe h, rodnina mv, komar aa ( ) synonymous codons direct cotranslational folding toward different protein conformations. molecular cell : - bulleid nj, ellgaard l ( ) multiple ways to make disulfides. trends in biochemical sciences : - chadani y, niwa t, izumi t, sugata n, nagao a, suzuki t, chiba s, ito k, taguchi h ( ) intrinsic ribosome destabilization underlies translation and provides an organism with a strategy of environmental sensing. molecular cell : - .e farmery mr, allen s, allen aj, bulleid nj ( ) the role of erp in disulfide bond formation during the assembly of major histocompatibility complex class i in a synchronized semipermeabilized cell translation system. the journal of biological chemistry : - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . harada y, li h, li h, lennarz wj ( ) oligosaccharyltransferase directly binds to ribosome at a location near the translocon-binding site. proceedings of the national academy of sciences of the united states of america : - hartl fu, bracher a, hayer-hartl m ( ) molecular chaperones in protein folding and proteostasis. nature : - kadokura h, dazai y, fukuda y, hirai n, nakamura o, inaba k ( ) observing the nonvectorial yet cotranslational folding of a multidomain protein, ldl receptor, in the er of mammalian cells. proceedings of the national academy of sciences of the united states of america : - klappa p, freedman rb, zimmermann r ( ) protein disulphide isomerase and a lumenal cyclophilin-type peptidyl prolyl cis-trans isomerase are in transient contact with secretory proteins during late stages of translocation. eur j biochem : - klappa p, ruddock lw, darby nj, freedman rb ( ) the b' domain provides the principal peptide-binding site of protein disulfide isomerase but all domains contribute to binding of misfolded proteins. the embo journal : - kodera n, yamamoto d, ishikawa r, ando t ( ) video imaging of walking myosin v by high-speed atomic force microscopy. nature : - kojima r, okumura m, masui s, kanemura s, inoue m, saiki m, yamaguchi h, hikima t, suzuki m, akiyama s et al ( ) radically different thioredoxin domain arrangement of erp , an efficient disulfide bond introducer of the mammalian pdi family. structure (london, england : ) : - koritzinsky m, levitin f, van den beucken t, rumantir ra, harding nj, chu kc, boutros pc, braakman i, wouters bg ( ) two phases of disulfide bond formation have differing requirements for oxygen. the journal of cell biology : - lee jy, hirose m ( ) partially folded state of the disulfide-reduced form of human serum albumin as an intermediate for reversible denaturation. the journal of biological chemistry : - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . machida k, mikami s, masutani m, mishima k, kobayashi t, imataka h ( ) a translation system reconstituted with human factors proves that processing of encephalomyocarditis virus proteins a and b occurs in the elongation phase of translation without eukaryotic release factors. the journal of biological chemistry : - matsuo y, ikeuchi k, saeki y, iwasaki s, schmidt c, udagawa t, sato f, tsuchiya h, becker t, tanaka k et al ( ) ubiquitination of stalled ribosome triggers ribosome-associated quality control. nature communications : mezghrani a, fassio a, benham a, simmen t, braakman i, sitia r ( ) manipulation of oxidative protein folding and pdi redox state in mammalian cells. the embo journal : - molinari m, helenius a ( ) glycoproteins form mixed disulphides with oxidoreductases during folding in living cells. nature : - nguyen vd, saaranen mj, karala ar, lappi ak, wang l, raykhel ib, alanen hi, salo ke, wang cc, ruddock lw ( ) two endoplasmic reticulum pdi peroxidases increase the efficiency of the use of peroxide during disulfide bond formation. journal of molecular biology : - nguyen vd, wallis k, howard mj, haapalainen am, salo ke, saaranen mj, sidhu a, wierenga rk, freedman rb, ruddock lw et al ( ) alternative conformations of the x region of human protein disulphide-isomerase modulate exposure of the substrate binding b' domain. journal of molecular biology : - noi k, yamamoto d, nishikori s, arita-morioka k, kato t, ando t, ogura t ( ) high-speed atomic force microscopic observation of atp-dependent rotation of the aaa+ chaperone p . structure (london, england : ) : - okumura m, kadokura h, inaba k ( ) structures and functions of protein disulfide isomerase family members involved in proteostasis in the endoplasmic reticulum. free radical biology & medicine : - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . okumura m, noi k, kanemura s, kinoshita m, saio t, inoue y, hikima t, akiyama s, ogura t, inaba k ( ) dynamic assembly of protein disulfide isomerase in catalysis of oxidative folding. nature chemical biology : - robinson pj, bulleid nj ( ) mechanisms of disulfide bond formation in nascent polypeptides entering the secretory pathway. cells robinson pj, kanemura s, cao x, bulleid nj ( ) protein secondary structure determines the temporal relationship between folding and disulfide formation. the journal of biological chemistry : - robinson pj, pringle ma, woolhead ca, bulleid nj ( ) folding of a single domain protein entering the endoplasmic reticulum precedes disulfide formation. the journal of biological chemistry : - rutkevich la, cohen-doyle mf, brockmeier u, williams db ( ) functional relationship between protein disulfide isomerase family members during the oxidative folding of human secretory proteins. molecular biology of the cell : - rutkevich la, williams db ( ) vitamin k epoxide reductase contributes to protein disulfide formation and redox homeostasis within the endoplasmic reticulum. molecular biology of the cell : - sato y, inaba k ( ) disulfide bond formation network in the three biological kingdoms, bacteria, fungi and mammals. the febs journal : - sato y, kojima r, okumura m, hagiwara m, masui s, maegawa k, saiki m, horibe t, suzuki m, inaba k ( ) synergistic cooperation of pdi family members in peroxiredoxin -driven oxidative protein folding. scientific reports : schulman s, wang b, li w, rapoport ta ( ) vitamin k epoxide reductase prefers er membrane-anchored thioredoxin-like redox partners. proceedings of the national academy of sciences of the united states of america : - sugio s, kashima a, mochizuki s, noda m, kobayashi k ( ) crystal structure of human serum albumin at . a resolution. protein engineering : - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . tavender tj, bulleid nj ( ) molecular mechanisms regulating oxidative activity of the ero family in the endoplasmic reticulum. antioxidants & redox signaling : - tavender tj, springate jj, bulleid nj ( ) recycling of peroxiredoxin iv provides a novel pathway for disulphide formation in the endoplasmic reticulum. the embo journal : - tian g, xiang s, noiva r, lennarz wj, schindelin h ( ) the crystal structure of yeast protein disulfide isomerase suggests cooperativity between its active sites. cell : - uchihashi t, watanabe yh, nakazaki y, yamasaki t, watanabe h, maruno t, ishii k, uchiyama s, song c, murata k et al ( ) dynamic structural states of clpb involved in its disaggregation function. nature communications : wang c, yu j, huo l, wang l, feng w, wang cc ( ) human protein-disulfide isomerase is a redox-regulated chaperone activated by oxidation of domain a'. the journal of biological chemistry : - zhang y, wölfle t, rospert s ( ) interaction of nascent chains with the ribosomal tunnel proteins rpl , rpl , and rpl of saccharomyces cerevisiae. the journal of biological chemistry : - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure - disulfide bond introduction into rnc -aa and -aa by pdi and erp a schematic structure of plasmids constructed in this study. ‘uorf ’ is an arrest sequence that serves to stall translation of the upstream protein and thereby prepare stable ribosome-nascent chain complexes (rncs). the bottom cartoon represents the location of cysteines and disulfide bonds in hsa domain i. hsa domain i consists of amino acids and contains five disulfide bonds and one free cysteine at residue . a green box indicates the pro-sequence. orange circles and red lines indicate cysteines and native disulfide bonds, respectively. the region predicted to be buried in the ribosome exit tunnel is shown by a cyan box. b domain organization of pdi and erp . redox-active trx-like domains with a cghc motif are indicated by cyan boxes, while redox-inactive ones in pdi are by light-green boxes. note that the pdi b’ domain contains a substrate-binding hydrophobic pocket. c, e time course of pdi-, erp -, and glutathione (no enzyme)-catalyzed disulfide bond introduction into rnc -aa (c) and -aa (e). ‘noss’ and ‘ ss’ denote reduced and single-disulfide-bonded species of hsa nascent chains, respectively. note that faint bands observed between “no ss” and “ ss” likely represent a species in which one of cysteines is not subjected to mal-peg modification due to glutathionylation. in support of this, these minor bands are even fainter under the conditions of no gsh/gssg. d, f quantification of disulfide-bonded species for rnc -aa (d) and -aa (f) based on the results shown in (c) and (e), respectively (n = ). figure - disulfide bond introduction into rnc -aa cys mutants by pdi and erp a cartoon of rnc constructs used in this study. in each construct, a cysteine (represented by a black circle) was mutated to alanine. note that rnc -aa c a retains a native cysteine pairing (i.e., cys and cys ), while rnc -aa c a and c a retain a non-native pairing. b and c time course of pdi- and erp -catalyzed disulfide bond introduction into rnc -aa c a (top), c a (middle), and c a (bottom) mutants. note that faint bands observed between “no ss” and “ ss” likely represent a species in which one of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cysteines is not subjected to mal-peg modification due to glutathionylation. quantification of disulfide-bonded species of rnc -aa cys mutants is based on the results shown for the upper raw data (n = ). d formation of a mixed disulfide bond between rnc -aa mono-cys mutants and pdi (upper)/erp (lower). ‘mixed’ and ‘no ss’ denote a mixed disulfide complex between pdi/erp and rnc mono-cys mutants and isolated rnc -aa, respectively. note that faint bands observed between ‘mixed’ and ‘no ss’ are likely non-specific bands, as they were seen at the same position regardless of which -aa mono-cys mutant was tested or whether an rnc was reacted with pdi or erp . e quantification of mixed disulfide species based on the results shown in (d). n = . f the cartoon on the left shows possible steric collisions between ribosomes and pdi when cys attacks the mixed disulfide between cys on rnc -aa and pdi (left). the cartoon on the right shows that erp can avoid this steric collision due to its higher flexibility and domain arrangement. figure - correlation of the distance between cys residues and the ribosome exit site with the efficiency of disulfide bond introduction by pdi/erp a cartoons of rnc constructs with [sg]-repeat insertions. a [sg] or [sg] repeat sequence was inserted into rnc- aa c a immediately after cys . b, d pdi- (b) and erp (d)-mediated disulfide bond introduction into rnc -aa c a with insertion of [sg] (upper) or [sg] (lower) repeats after cys . c, e quantification of disulfide-bonded species ( ss) based on the results shown in (b) and (d). n = for pdi and for erp . f formation of a mixed disulfide bond between the -aa mono-cys mutant with a [sg] repeat and pdi (upper)/erp (lower). note that bands observed between ‘mixed’ and ‘no ss’ are likely non-specific bands, as they were seen at the same position regardless of which -aa mono-cys [sg] mutant was tested or whether an rnc was reacted with pdi or erp . g quantification of mixed disulfide species based on the results shown in (f). n = . figure - disulfide bond introduction into rnc -aa by pdi and erp (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a schematic structure of rnc- -aa. orange circles and red lines in the bottom cartoon indicate cysteines and native disulfides, respectively. the region predicted to be buried in the ribosome exit tunnel is shown by a cyan box. b time course of pdi ( . m)-, erp ( . m)-, and their mixture ( . m each)-catalyzed disulfide bond introduction into rnc -aa. ‘noss’ and ‘ ss’ denote reduced and single-disulfide-bonded species of the hsa nascent chain, respectively. c quantification of the single-disulfide-bonded ( ss) species based on the result shown in (b) (n = ). figure - high-speed afm analysis of erp a afm images (scan area,  Å; scale bar, Å) for erp v-shape (left) and o-shape (right) conformations. b left upper: histograms of circularity calculated from afm images of erp . values represent the average circularity (mean ± s.d.) calculated from curve fitting with a single- (middle and right) or two- (left) gaussian model. left lower: histograms of height calculated from afm images of erp . values represent the average height (mean ± s.d.) calculated from curve fitting with a single-gaussian model. right: two-dimensional scatterplots of the height versus circularity for erp molecules observed by hs-afm. c time-course snapshots of oxidized erp captured by hs-afm. the images were traced for s. see also movie ev . d time trace of the circularity of an erp molecule. e histogram of the circularity of erp calculated from the time-course snapshots shown in (d). figure - single-molecule observation of pdi/erp acting on -aa ca rnc by high-speed atomic force microscopy a the afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca rnc in the absence of pdi family enzymes on a ni + -coated mica surface. the surface model on the right side of each afm image illustrates ribosome whose view angle is approximately adjusted to the observed rnc particle. s and s ribosomal subunits are shown in red and blue, respectively. b upper afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . rnc in the presence of oxidized pdi ( nm). pdi molecules that appear to bind -aa ca rnc are marked by red squares. lower images (scan area, Å  Å; scale bar, Å) highlight the regions surrounded by red squares in the upper images. c upper afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca rnc in the presence of oxidized erp ( nm). erp molecules that appear to bind -aa ca rnc are marked by blue squares. lower images (scan area, Å  Å; scale bar, Å) highlight the regions surrounded by blue squares in the upper images. d histograms of the rnc binding time of the pdi monomer (left), the pdi dimer (middle), and erp (right), calculated from the observed afm images. e histograms of the distance between the edge of the ribosome and the centers of rnc-neighboring pdi (left) and erp (right) molecules, calculated from the observed afm images. values represent the average distance (mean ± s.d.) calculated from curve fitting with a single-gaussian model. figure - role of the pdi hydrophobic pocket in pdi-mediated disulfide bond introduction into rnc -aa a disulfide bond introduction into rnc -aa by pdi i a (upper) and erp (lower). note that faint bands observed between “no ss” and “ ss” likely represent a species in which one of cysteines is not subjected to mal-peg modification due to glutathionylation. in support of this, these minor bands are even fainter under the conditions of no gsh/gssg. b quantification of disulfide-bonded species based on the results shown in (a). quantifications for erp and pdi are based on the results shown in fig e and f. n = . c hs-afm analyses for binding of pdi i a to rnc ca -aa. upper afm images (scan area, Å  Å; scale bar, Å) display the pdi i a molecules that bind -aa ca rnc, as marked by red squares. lower images (scan area, Å  Å; scale bar, Å) highlight the regions surrounded by red squares in the upper images. d histograms show the distribution of the rnc binding time of the pdi i a monomers (left) and dimers (right). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . e histogram shows the distribution of the distance between the edge of the ribosome and the centers of rnc-neighboring pdi i a molecules, calculated from the observed afm images. values represents the average distance (mean ± s.d.) calculated from curve fitting with a single-gaussian model. figure - proposed model of co-translational disulfide bond introduction into nascent chains by erp and pdi during the early stages of translation, erp introduces disulfide bonds through transient binding to a nascent chain. for efficient disulfide introduction by erp , a pair of cysteines must be exposed by at least ~ amino acids from the ribosome exit site. by contrast, pdi introduces disulfide bonds by holding a nascent chain inside the central cavity of the pdi homodimer during the later stages of translation, where a pair of cysteines must be exposed by at least ~ amino acids from the ribosome exit site. however, when a longer polypeptide is exposed outside the ribosome, erp - or pdi-mediated disulfide bond formation can be slower, possibly due to formation of higher-order conformation in the nascent chain. longer nascent chains may allow pdi family enzymes to compete with each other for binding and acting on rnc. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table – primers used in this study (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig. a n chsa domainⅠ (x aa) uorf ( aa)flag ( aa) arrest sequence ribosome exit tunnel ~ aa nascent chain aa (pro aa + hsa aa) aa (pro aa + hsa aa) c c c c cc c c c c cpron c phe glu d no ss ss c no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi gsh/gssg mal-peg k aa time(s) －－－＋＋＋＋＋＋＋＋＋＋＋＋＋ gsh/gssg mal-peg k aa time(s) e pdi erp no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg mal-peg k aa time(s) glutathione no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi gsh/gssg mal-peg k aa time(s) f pdi erp glutathione no ss ss ib : flag －－－＋＋＋＋＋＋＋＋＋＋＋＋＋ gsh/gssg mal-peg k aa time(s) no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg mal-peg k aa time(s) cghc cghc cghc -s-s- -s-s- -s-s- trx trx trx erp cghc cghc -s-s- -s-s- a b a’b’ pdi b hydrophobic pocket d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cys cys cys ** *** p= . fig. native aa c a a c c non-native aa c a c c a aa c a c a c non-native a b c no ss ss no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg mal-peg k aa c a (native) time(s) aa c a (non-native) aa c a (non-native) no ss ss no ss ss no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi gsh/gssg mal-peg k aa c a (native) time(s) aa c a (non-native) aa c a (non-native) no ss ss d e non-reducing reducing remaining cys residue non-reducing reducing remaining cys residue aa mono-cys mutant + pdi mixed no ss aa mono-cys mutant + erp mixed no ss m ix e d d is u lf id e b o n d f o rm e d ( % ) pdi erp pdi low flexibility erp high flexibility f aa c a aa c a aa c a aa c a aa c a aa c a d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) * * (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . no ss ss no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg mal-peg k aa c a [sg] time(s) aa c a [sg] d b no ss ss no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi gsh/gssg mal-peg k aa c a [sg] time(s) aa c a [sg] aa c a [sg]x + pdi c sg sg sg aa c a [sg]x + erp e sg sg sg g m ix e d d is u lf id e b o n d f o rm e d ( % ) pdi erp cys cys cys n.s. n.s. n.s. f non-reducing reducing remaining cys residue aa mono-cys [sg] mutant + pdi mixed no ss aa mono-cys [sg] mutant + erp mixed no ss non-reducing reducing remaining cys residue a d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) aa c a [sg] native a c c [sg] aa c a [sg] native a c c [sg] d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) * * fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a b no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi gsh/gssg peg-pcmal aa time(s) no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi & erp gsh/gssg peg-pcmal aa time(s) ib : flag no ss ss －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg peg-pcmal aa time(s) c d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) pdi erp erp +pdi n chsa domainⅠ (x aa) uorf ( aa)flag ( aa) arrest sequence ribosome exit tunnel ~ aa nascent chain aa (pro aa + hsa aa) c c c c cc c c c c cpron c thr fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . . . . . . n u m b e r o f fr a m e s circularity n u m b e r o f m o le c u le s height (nm) . . . . . . n u m b e r o f m o le c u le s circularity . . . . . . h e ig h t (n m ) circularity . ± . . ± . total (n= ) b . ± . nm total (n= ) d e a ( ) . sec ( ) . sec ( ) . sec ( ) . sec ( ) . sec c Å . . o-shape molecule cir: . cir: . Å . . . . v-shape molecule cir: . cir: . Å . . . . Å Å Å Å Å Å . . . . c ir c u la ru ty time (sec) ( ) ( ) ( ) ( ) ( ) o-shape v-shape o-shape v-shape . ± . . ± . n u m b e r o f m o le c u le s n u m b e r o f m o le c u le s circularity height (nm) circularity h e ig h t (n m ) . time (s) . . . c ir c u la ri ty n u m b e r o f fr a m e s circularity fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . n u m b e r o f m o le c u le s binding time (sec) n u m b e r o f m o le c u le s distance (nm) n u m b e r o f m o le c u le s distance (nm) . ± . nm . ± . nm b d c pdi monomer erp pdi erp e pdi dimer Å Å -aa ca rnc + oxidized pdi monomer dimer dimer c lo s e d -u p . . . . . . . . . . . . Å Å -aa ca rnc + oxidized erp monomer monomer c lo s e d -u p . . . . . . . . . . . . . . a n u m b e r o f m o le c u le s binding time (sec) Å Å Å Å Å Å Å Å Å n u m b e r o f m o le c u le s binding time (sec) n u m b e r o f m o le c u le s binding time (s) n u m b e r o f m o le c u le s binding time (s) n u m b e r o f m o le c u le s binding time (s) n u m b e r o f m o le c u le s distance (nm) n u m b e r o f m o le c u le s distance (nm) fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . no ss ss a no ss ss ib : flag －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ pdi i a gsh/gssg mal-peg k aa time(s) －－－－－＋－＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋＋ erp gsh/gssg mal-peg k aa time(s) b pdi erp erp pdi i a d is u lf id e b o n d in tr o d u c ti o n (% ) time (s) c lo s e d -u p monomer dimer . . . . . . . . Å Å Å Å Å Å dimer . . . . -aa ca rnc + oxidized pdi i a n u m b e r o f m o le c u le s binding time (sec) n u m b e r o f m o le c u le s binding time (sec) pdi i a monomer pdi i a dimer n u m b e r o f m o le c u le s binding time (s) n u m b e r o f m o le c u le s binding time (s) n u m b e r o f m o le c u le s distance (nm) pdi i a n u m b e r o f m o le c u le s distance (nm) . ± . nm c d e fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . aa ribosome cytosol er lumen aa ーsh erp pdi pdierp competition fig. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . expanded view distinct roles and actions of pdi family enzymes in catalysis of nascent-chain disulfide formation chihiro hirayama , kodai machida #, kentaro noi #, tadayoshi murakawa , masaki okumura , , teru ogura , , hiroaki imataka , and kenji inaba * (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure ev - redox states of pdi and erp in glutathione redox buffer and disulfide bond introduction into aa c a, catalyzed by pdi a domain a redox states of pdi and erp in the presence of mm gsh and . mm gssg. purified pdi and erp were incubated for mins at ºc in the above glutathione redox buffer and modified with mm mal-peg k for separation on sds gels. b quantification based on the results shown in (a). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure ev - statistical analysis of rnc molecules observed by hs-afm in the presence or absence of pdi/erp a number of particles observed for nc-rnc or -aa ca rnc molecules present in isolation or bound to pdi/erp molecules. b ratio of nc-rnc or -aa ca rnc molecules present in isolation or bound to pdi/erp , calculated based on the observed number of particles in (a). note that a minor portion of nc-rnc or -aa ca rnc molecules were bound to many erp /pdi molecules, possibly due to serious structural damages of the rnc molecules. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure ev - representative time-course snapshots captured by hs-afm for -aa ca rnc bound to the pdi monomer (a), the pdi dimer (b), and erp (c). a time-course snapshots captured by hs-afm for the pdi monomer binding to -aa ca rnc. the afm images (scan area, Å  Å; scale bar, Å) displaying - aa ca rnc in the presence of oxidized pdi ( µm). white arrows indicate the monomeric pdi molecules that bind to -aa ca rnc. see also supplementary video . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . b time-course snapshots captured by hs-afm for the pdi dimer binding to -aa ca rnc. the afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca rnc in the presence of oxidized pdi ( µm). white arrows indicate the dimeric pdi molecules that bind to -aa ca rnc. see also supplementary video . c time-course snapshots captured by hs-afm for erp binding to -aa ca rnc. the afm images (scan area, , Å  , Å; scale bar, Å) displaying -aa ca rnc in the presence of oxidized erp ( µm). white arrows indicate the erp molecules that bind to -aa ca rnc. see also supplementary video . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure ev - representative time-course snapshots captured by hs-afm for -aa ca rnc bound to the pdi i a monomer (a), and the pdi i a dimer (b). a time-course snapshots captured by hs-afm for the pdi i a monomer binding to -aa ca rnc. the afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca rnc in the presence of oxidized pdi i a ( µm). white arrows indicate the monomeric pdi i a molecules that bind to -aa ca rnc. see also supplementary video . b time-course snapshots captured by hs-afm for the pdi i a dimer binding to - aa ca rnc. the afm images (scan area, Å  Å; scale bar, Å) displaying -aa ca rnc in the presence of oxidized pdi i a ( µm). white arrows indicate the dimeric pdi i a molecules that bind to -aa ca rnc. see also supplementary video . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . movie ev - hs-afm movies showing structure dynamics of oxidized erp . this movie is a source of the time-course snapshots shown in fig c. movie ev - hs-afm movies showing the binding of the pdi monomer to -aa ca rnc. this movie is a source of the time-course snapshots shown in supplementary fig ev a. movie ev - hs-afm movies showing the binding of the pdi dimer to -aa ca rnc. this movie is a source of the time-course snapshots shown in supplementary fig ev b. movie ev - hs-afm movies showing the binding of erp to -aa ca rnc. this movie is a source of the time-course snapshots shown in supplementary fig ev c. movie ev - hs-afm movies showing the binding of the pdi i a monomer to -aa ca rnc. this movie is a source of the time-course snapshots shown in supplementary fig ev a. movie ev - hs-afm movies showing the binding of the pdi i a dimer to - aa ca rnc. this movie is a source of the time-course snapshots shown in supplementary fig ev b. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a covid moonshot: assessment of ligand binding to the sars-cov- main protease by saturation transfer difference nmr spectroscopy anastassia l. kantsadi , emma cattermole , minos-timotheos matsoukas , georgios a. spyroulias and ioannis vakonakis * department of biochemistry, university of oxford, south parks road, oxford ox qu, united kingdom department of pharmacy, university of patras, panepistimioupoli campus, gr- , greece *to whom correspondence should be addressed, e-mail: ioannis.vakonakis@bioch.ox.ac.uk, tel.: + , fax: + short title: assessment of ligand binding to sars-cov- mpro by std-nmr keywords: sars-cov- , covid- , moonshot, mpro, nmr, std, screening, fragments, molecular dynamics, md, competition .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / abstract severe acute respiratory syndrome coronavirus (sars-cov- ) is the etiological cause of the coronavirus disease , for which no effective therapeutics are available. the sars-cov- main protease (mpro) is essential for viral replication and constitutes a promising therapeutic target. many efforts aimed at deriving effective mpro inhibitors are currently underway, including an international open-science discovery project, codenamed covid moonshot. as part of covid moonshot, we used saturation transfer difference nuclear magnetic resonance (std-nmr) spectroscopy to assess the binding of putative mpro ligands to the viral protease, including molecules identified by crystallographic fragment screening and novel compounds designed as mpro inhibitors. in this manner, we aimed to complement enzymatic activity assays of mpro performed by other groups with information on ligand affinity. we have made the mpro std-nmr data publicly available. here, we provide detailed information on the nmr protocols used and challenges faced, thereby placing these data into context. our goal is to assist the interpretation of mpro std-nmr data, thereby accelerating ongoing drug design efforts. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction infections by the severe acute respiratory syndrome coronavirus (sars-cov- ) resulted in approximately . million deaths in ( ) and led to the coronavirus (covid- ) pandemic ( - ). sars-cov- is a zoonotic betacoronavirus highly similar to sars-cov and mers-cov, which caused outbreaks in and , respectively ( - ). sars-cov- encodes its proteome in a single, positive-sense, linear rna molecule of ~ kb length, the majority of which (~ . kb) is translated into two polypeptides, pp a and pp ab, via ribosomal frame-shifting ( , ). key viral enzymes and factors, including most proteins of the reverse-transcriptase machinery, inhibitors of host translation and molecules signalling for host cell survival, are released from pp a and pp ab via post- translational cleavage by two viral cysteine proteases ( ). these proteases, a papain-like enzyme cleaving pp ab at three sites, and a c-like protease cleaving the polypeptide at sites, are primary targets for the development of antiviral drugs. the c-like protease of sars-cov- , also known as the viral main protease (mpro), has been the target of intense study owing to its centrality in viral replication. mpro studies have benefited from previous structural analyses of the sarc-cov c-like protease and the earlier development of putative inhibitors ( - ). the active sites of these proteases are highly conserved, and peptidomimetic inhibitors active against mpro are also potent against the sars-cov c-like protease ( , ). however, to date no mpro-targeting inhibitors have been validated in clinical trials. in order to accelerate mpro inhibitor development, an international, crowd-funded, open-science project was formed under the banner of covid moonshot ( ), combining high-throughput crystallographic screening ( ), computational chemistry, enzymatic activity assays and mass spectroscopy ( ) among the many methodologies contributed by collaborating groups. as part of covid moonshot, we utilised saturation transfer difference nuclear magnetic resonance (std-nmr) spectroscopy ( - ) to investigate the mpro binding of ligands initially identified by crystallographic screening, as well as molecules designed specifically as non-covalent inhibitors of this protease. our goal was to provide orthogonal information on ligand binding to that which could be gained by enzymatic activity assays conducted in parallel by other groups. std-nmr is a proven method for characterising the binding of small molecules to biological macromolecules, able to provide both quantitative affinity information and structural data on the proximity of ligand chemical groups to the protein. here, we provide detailed documentation on the nmr protocols used to record these data and highlight the advantages, limitations and assumptions underpinning our approach. our aim is to assist the comparison of mpro std-nmr data with other quantitative measurements, and facilitate the consideration of these data when designing future mpro inhibitors. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / materials and methods protein production and purification we created a sars-cov- mpro genetic construct in pfloat vector ( ), encoding for the viral protease and an n-terminal his -tag separated by a modified human rhinovirus (hrv) c protease recognition site, designed to reconstitute a native mpro n-terminus upon hrv c cleavage. the mpro construct was transformed into escherichia coli strain rosetta(de ) (novagen) and transformed clones were pre-cultured at °c for h in lysogeny broth supplemented with appropriate antibiotics. starter cultures were used to inoculate l of terrific broth autoinduction media (formedium) supplemented with % v/v glycerol and appropriate antibiotics. cell cultures were grown at °c for h and then cooled to °c for h. bacterial cells were harvested by centrifugation at , x g for min. cell pellets were resuspended in mm trisaminomethane (tris)-cl ph , mm nacl, mm imidazole buffer, incubated with . mg/ml benzonase nuclease (sigma aldrich) and lysed by sonication on ice. lysates were clarified by centrifugation at , x g at °c for h. lysate supernatants were loaded onto a hitrap talon metal affinity column (ge healthcare) pre- equilibrated with lysis buffer. column wash was performed with mm tris-cl ph , mm nacl and mm imidazole, followed by protein elution using the same buffer and an imidazole gradient from to mm concentration. the his -tag was cleaved using home-made hrv c protease. the hrv c protease, his -tag and further impurities were removed by a reverse hitrap talon column. flow-through fractions were concentrated and applied to a superdex / size exclusion column (ge healthcare) equilibrated in nmr buffer ( mm nacl, mm na hpo ph . ). nuclear magnetic resonance (nmr) spectroscopy all nmr experiments were performed using a mhz solution-state instrument comprising an oxford instruments superconducting magnet, bruker avance iii console and tci probehead. a bruker samplejet sample changer was used for sample manipulation. experiments were performed and data processed using topspin (bruker). for direct std-nmr measurements, samples comprised μm mpro and variable concentrations ( μm – mm) of ligand compounds formulated in nmr buffer supplemented with % v/v d o and deuterated dimethyl sulfoxide (d -dmso, . % d, sigma aldrich) to % v/v final d -dmso concentration. in competition experiments, samples comprised μm mpro, . mm of ligand x and variable concentrations ( – μm) of competing compound in nmr buffer supplemented with d o and d -dmso as above. sample volume was .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / μl and samples were loaded in mm outer diameter samplejet nmr tubes (bruker) placed in - tube racks. nmr tubes were sealed with pom balls. std-nmr experiments were performed at oc using a pulse sequence described previously ( ) and an excitation sculpting water-suppression scheme ( ). protein signals were suppressed in std- nmr by the application of a msec spin-lock pulse. we collected time-domain data of , complex points and . μsec dwell time ( . khz sweepwidth). data were collected in an interleaved pattern, with on- and off-resonance irradiation data separated into blocks of transients each ( total transients per irradiation frequency). transient recycle delay was sec and on- or off-resonance irradiation was performed using . mw of power for . sec at . ppm or ppm, respectively, for a total experiment time of approximately minutes. reconstructed time- domain data from the difference of on- and off-resonance irradiation (std spectra) or only the off- resonance irradiation (reference spectra) were processed by applying a hz exponential line broadening function and -fold zero-filling prior to fourier transformation. phasing parameters were derived for each sample from the reference spectra and copied to the std spectra. h peak intensities were integrated in topspin using a local-baseline adjustment function. data fitting to extract kd values were performed in originpro (originlab). the folded state of m pro in the presence of each ligand was verified by collecting h nmr spectra similar to fig. a from all samples ahead of std-nmr experiments. ligand handling compounds for the initial std-nmr assessment of crystallographic fragment binding to mpro were provided by the xchem group at diamond light source in the form of a -well plated library (dsi- poised, enamine), with compounds dissolved in d -dmso at mm nominal concentration. μl of dissolved compounds was aspirated from this library and immediately mixed with μl of d -dmso for a final fragment concentration of mm, from which nmr samples were formulated. for titrations of the same crystallographic fragments compounds were procured directly from enamine in the form of lyophilized powder, which was dissolved in d -dmso to derive compound stocks at mm and mm concentrations for nmr sample formulation. std-nmr assays of bespoke mpro ligands used compounds commercially synthesised for covid moonshot. these ligands were provided to us by the xchem group in -well plates, containing . μl of mm d -dmso-disolved compound per well. plates were created using an echo liquid handling robot (labcyte) and immediately sealed and frozen at - oc. for use, ligand plates were .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / thoroughly defrosted at room temperature and spun at , g for minutes. in single- concentration std-nmr experiments, μl of a pre-formulated mixture of mpro and nmr buffer with d o and d -dmso were added to each well to create the final nmr sample. for std-nmr competition experiments, . μl of ligands were aspirated from the plates and immediately mixed with . μl of d -dmso for final ligand concentration of . mm from which nmr samples were formulated. molecular dynamics (md) simulations the monomeric complexes of mpro bound to chemical fragments were obtained from the rcsb protein data bank entries r (ligand x ), reb (x ), rgi (x ), rgk (x ), r (x ) and reh (x ) for md simulations with gromacs version ( ) and the amber sb-ildn force field ( ). all complexes were inserted in a pre-equilibrated box containing water implemented using the tip p water model ( ). force field parameters for the six ligands were generated using the general amber force field and hf/ – g*– derived resp atomic charges ( ). the reference system consisted of the protein, the ligand, ~ , water molecules, na and cl ions in a x x Å simulation box, resulting in a total number of ~ , atoms. each system was energy-minimized and subsequently subjected to a ns md equilibration, with an isothermal-isobaric ensemble using isotropic pressure control ( ), and positional restraints on protein and ligand coordinates. the resulting equilibrated systems were replicated times and independent ns md trajectories were produced with a time step of fs, in constant temperature of k, using separate v-rescale thermostats ( ) for the protein, ligand and solvent molecules. lennard-jones interactions were computed using a cut-off of Å and electrosta�c interac�ons were treated using particle mesh ewald ( ) with the same real-space cut-off. analysis on the resulting trajectories was performed using mdanalysis ( , ). structures were visualised using pymol ( ). notes the enzymatic inhibition potential of mpro ligands, measured by rapidfire mass spectroscopy ( ), was retrieved from the collaborative drug discovery database ( ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / results std-nmr assays of m pro ligand binding mpro forms dimers in crystals via an extensive interaction interface involving two domains ( ). mpro dimers likely have a sub-μm solution dissociation constant (kd) by analogy to previously studied c-like coronavirus proteases ( ). at the μm protein concentration of our nmr assays mpro is, thus, expected to be dimeric with an estimated molecular weight of nearly kda. despite the relatively large size of mpro for solution nmr, h spectra of the protease readily showed the presence of multiple up-field shifted (< . ppm) peaks corresponding to protein methyl groups (fig. a). in addition to demonstrating that mpro is folded under the conditions tested, these spectra allowed us to identify the chemical shifts of mpro methyl groups that may be suitable for on-resonance irradiation in std-nmr experiments. trials with on-resonance irradiation applied to different methyl group peaks showed that irradiating at . ppm (fig. a) produced the strongest std signal from ligands in the presence of mpro, while simultaneously avoiding ligand excitation that would yield false-positive signals in the absence of mpro (fig. b). further, we noted that small molecules abundant in the samples but not binding specifically to mpro, such as dmso, produced pseudo- dispersive residual signal lineshapes in std spectra, while true mpro ligands produced peaks in std with absorptive h lineshapes. we surmised that std-nmr is suitable for screening ligand binding to mpro, requiring relatively small amounts ( - μgr) of protein and time (under hour) per sample studied. the strength of std signal is quantified by calculating the ratio of integrated signal intensity of peaks in the std spectrum over that of the reference spectrum (stdratio). the stdratio factor is inversely proportional to ligand kd, as �� where [l] is ligand concentration. measuring stdratio values over a range of ligand concentrations allows fitting of the proportionality constant and calculation of ligand kd. however, time and sample-amount considerations, including the limited availability of bespoke compounds synthesized for the covid moonshot project, made recording full std-nmr titrations impractical for screening hundreds of ligands. thus, we evaluated whether measuring the stdratio value at a single ligand concentration may be an informative alternative to kd, provided restraints could be placed, for example, on the proportionality constant. theoretical and practical considerations suggested that three parameters influence our evaluation of single-concentration stdratio values towards an affinity context. firstly, the stdratio factor is affected by the efficiency of noe magnetisation transfer between protein and ligand, which in turn depends on the proximity of ligand and protein groups, and the chemical nature of these .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / groups ( - ). to minimize the influence of these factors across diverse ligands, we sought to quantify the stdratio of only aromatic ligand groups, and only consider those showing the strongest std signal; thus, that are in closest proximity to the protein. second, std-nmr assays require ligand exchange between protein-bound and -free states in the timeframe of the experiment; strongly bound compounds that dissociate very slowly from the protein would yield reduced stdratio values compared to weaker ligands that dissociate more readily. structures of mpro with many different ligands show that the protein conformation does not change upon complex formation and that the active site is fully solvent-exposed ( ), which suggests that ligand association can proceed with high rate ( – m- s- ). under this assumption, the ligand dissociation rate is the primary determinant of interaction strength. given the duration of the std-nmr experiment in our assays, and the ratios of ligand:protein used, we estimated that significant protein – ligand exchange will take place even for interactions as strong as low-μm kd. finally, uncertainties or errors in nominal ligand concentration skew the correlation of stdratio to compound affinities; as shown in fig. s , stdratio values increase strongly when very small amounts of ligands are assessed. thus, overly large stdratio values may be measured if ligand concentrations are significantly lower than anticipated. quantitating m pro binding of ligands identified by crystallographic screening mindful of the limitations inherent to measuring single-concentration stdratio values, and prior to using std-nmr to evaluate bespoke mpro ligands, we used this method to assess binding to the protease of small chemical fragments identified in crystallographic screening experiments ( ). in crystallographic screening campaigns of other target proteins such fragments were seen to have very weak affinities (> mm kd, e.g. ( )), thereby satisfying the exchange criterion set out above. non-covalent mpro interactors are part of the dsi-poised fragment library to which we were given access, comprising active site binders, two compounds targeting the mpro dimerisation interface and molecules binding elsewhere on the protein surface ( ). we initially recorded std-nmr spectra from these compounds in the absence of mpro to confirm that we obtained no or minimal std signal when protease is omitted, and to verify ligand identity from reference h spectra. five ligands gave no solution nmr signal or produced reference h spectra inconsistent with the compound chemical structure; these ligands were not evaluated further. samples of μm mpro and . mm nominal ligand concentration were then formulated from the remaining compounds (table s ), and std-nmr spectra were recorded, from which only aromatic ligand std signals were considered for further analysis. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / we observed large variations in std signal intensity and stdratio values in the presence of m pro across compounds (fig. a,b; table s ), with many ligands producing little or no std signal, suggesting substantial differences in compound affinity for the protease. however, we also noted that ligand reference spectra different substantially in intensity (fig. c), despite compounds being at the same nominal concentration. integrating ligand peaks in these reference spectra revealed differences in per- h intensity of up to ~ -fold, indicating significant variation of ligand concentrations in solution (table s ). such concentration differences could arise from errors in sample formulation or from concentration inconsistencies in the compound library. to evaluate the former we also integrated the residual h signal of d -dmso in our reference spectra, and found it to vary by less than % across any pair of samples ( % average deviation). as dmso was added alongside ligands in our samples, we concluded that sample formulation may have contributed errors in compound concentration of up to ~ / , but did not account for the ~ -fold differences in concentration observed. given that differences in compound concentration can skew the relative stdratio values of ligands (fig. s ), and that such concentration differences were also observed among newly designed mpro inhibitors (see below), we questioned whether recording stdratio values under these conditions can provide useful information. to address this question we attempted to quantify the affinity of crystallographic fragments to mpro, selecting ligands that showed clear differences in stdratio values in the assays above and focusing on compounds binding at the mpro active site; hence, that are of potential interest to inhibitor development. we performed mpro binding titrations monitored by std- nmr of compounds x , x , x and x in μm – mm concentrations (fig. s ), and noted that only compounds x and x , which show the highest stdratio (fig. a), bound strongly enough for an affinity constant to be estimated (kd of . ± . mm and . ± . mm, respectively). in contrast, the titrations of x and x , which yielded lower stdratio values, could not be fit to extract a kd indicating weaker binding to m pro. to further this analysis, we assessed the binding of fragments x , x , x , x , x and x to the mpro active site using quadruplicate atomistic molecular dynamics (md) simulations of nsec duration. as shown in fig. s a,b, and movies s and s , fragments with high stdradio values (x and x ) always located in the mpro active site despite exchanging between different binding conformations (fig. s ), with average ligand root-mean-square-deviation (rmsd) of . Å and . Å respectively after the first nsec of simulation. medium stdratio value fragments (x and x , fig. s c,d, and movies s and s ) show average rmsds of approximately Å in the same simulation timeframe, frequently exchanging to alternative binding poses and with x occasionally exiting the mpro active site. in contrast, fragments showing very little std nmr signal .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / (x and x , fig. s e,f, and movies s and s ) regularly exit the mpro active site and show average rmsds in excess of Å with very limited stability. combining the quantitative kd and md information above, we surmised that, despite limitations inherent in this type of analysis and uncertainties in ligand amounts, stdratio values recorded at single compound concentration can act as proxy measurements of mpro affinity for ligands. assessment of m pro binding by covid moonshot ligands we proceeded to characterise by std-nmr the mpro binding of bespoke ligands created as part of the covid moonshot project and designed to act as non-covalent inhibitors of the protease ( ). similar to the assays of crystallographic fragments above, we focused our analysis of std signals to aromatic moieties of ligands binding to the mpro active side and extracted stdratio values only from the strongest std peaks. once again, we noted substantial differences in apparent compound concentrations, judging from reference h spectral intensities (fig. a), which could not be attributed to errors in sample preparation as the standard deviation of residual h intensity in the d -dmso peak did not exceed % in any of the ligand batches tested. crucially, out of different molecules tested, samples of compounds ( . %) contained no ligand and ( . %) very little ligand (fig. a). in these cases, nmr assays were repeated using a separate batch of compound; however, . % of repeat experiments yielded the same outcome of no or very little ligand in the nmr samples. we measured stdratio values from samples were ligands produced sufficiently strong reference h nmr spectra to be readily visible, and deposited these values and associated raw nmr data to the collaborative drug discovery database ( ). some of these ligands were assessed independently for enzymatic inhibition of mpro using a mass spectroscopy method as part of the covid moonshot collaboration ( ). where both parameters are available, we compared the stdratio values and % inhibition concentrations (ic ) of these ligands. as shown in fig. b, stdratio and ic values show weak correlation (r = %) for most ligands tested; however, a subset of ligands displayed conspicuously low or even no std signals considering their effect on mpro activity, and presented themselves as outliers in the correlation graph. as these outlier ligands had ic values below μm, suggesting that their affinities to the protease may be in the μm kd region, we considered whether our approach gives rise to false-negative std results, for example through slow ligand dissociation from mpro. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / to address this question, we derived an assay whereby the bespoke, high-affinity mpro inhibitor would outcompete a lower-affinity ligand known to provide strong std signal from the protease active site. in these experiments the lower-affinity ligand would act as ‘spy’ molecule whose std signal reduces as function of inhibitor concentration. we used fragment x , which yields substantial std signal with mpro (fig. b and a), as ‘spy’, and tested protease inhibitors edj-med- a e - , lon-wei-ff b a- , cho-msk- e f- and lor-nor- bb - as x competitors. of these inhibitors, edj-med-a e - gave rise to substantial std signal in earlier assays, whereas the remaining produced little or no std signal; yet, all four inhibitors were reported to have low-μm or sub-μm ic values based on m pro enzymatic assays. in these competition experiments, both edj-med-a e - and lon-wei-ff b a- yielded kd parameters comparable to the reported ic values (fig. s a,b), showing that at least in the case of lon-wei- ff b a- the absence of std signal in the single-concentration nmr assays above represented a false-negative result. in contrast, cho-msk- e f- and lor-nor- bb - were unable to compete x from the protease active site (fig. s c,d), suggesting that in these two cases the reported ic values do not reflect inhibitor binding to the protease, and that the weak std signal of the initial assays was a better proxy of affinity. we surmised that although some low stdratio values of mpro inhibitors may not accurately reflect compound affinity to the protease, such values cannot be discounted as a whole as they may correspond to non-binding ligands. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / discussion fragment-based screening is a tried and tested method for reducing the number of compounds that need to be assessed for binding against a specific target in order to sample chemical space ( ). combined with x-ray crystallography, which provides information on the target site and binding pose of ligands, initial fragments can quickly be iterated into potent and specifically-interacting compounds. the covid moonshot collaboration ( ) took advantage of crystallographic fragment- based screening ( ) to initiate the design of novel inhibitors targeting the essential main protease of the sars-cov- coronavirus; however crystallographic structures do not report on ligand affinity and inhibitory potency in enzymatic assays does not always correlate with ligand binding. thus, supplementing these methods with solution nmr tools highly sensitive to ligand binding can provide a powerful combination of orthogonal information and assurance against false starts. we showed that std-nmr is a suitable method for characterising ligand binding to mpro, allowing us to assess ligand interactions using relatively small amounts of protein and in under one hour of experiment time per ligand (fig. b). however, screening compounds in a high-throughput manner is not compatible with the time- and ligand-amount requirements of full std-nmr titrations. thus, we resorted to using an unconventional metric, the single-concentration stdratio value, as proxy for ligand affinity. although this metric has limitations due to its dependency on magnetisation transfer between protein and ligand, and on relatively rapid exchange between the ligand-free and -bound states, we demonstrated that it can nevertheless be informative. specifically, the relative stdratio values of chemical fragments bound to the mpro active site provided insight on fragment affinity (fig. a), as crosschecked by quantitative titrations (fig. s ) and md simulations (fig. s ). furthermore, stdratio values of covid moonshot compounds held a weak correlation to enzymatic ic parameters (fig. b), although false-negative and -positive results from both methods contribute to multiple outliers. thus, in our view the biggest limitation of using the single-concentration stdratio value as metric relates to its supra-linear sensitivity to ligand concentration (fig. s ), which as demonstrated here can vary substantially across ligands in a large project (fig. a). how then should the std data recorded as part of covid moonshot be used? firstly, we showed that at least for some bespoke mpro ligands the stdratio value obtained is a better proxy for compound affinity compared to ic parameters from enzymatic assays (fig. s ). this, inherently, is the value of employing orthogonal methods thereby minimizing the number of potential false results. thus, when one is considering existing mpro ligands to base the design of future inhibitors, a high stdratio value as well as low ic parameters are both desirable. second, due to the aforementioned limitations of single-concentration stdratio value as proxy of affinity, and the .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / influence of uncertainties in ligand concentrations, we believe that comparisons of compounds and derivatives differing by less than ~ % in stdratio is not meaningful. rather, we propose that the stdratio values of m pro ligands measured and available at the cdd database should be treated as a qualitative metrics of compound affinity. in conclusion, we presented here protocols for the assessment of sars-cov- mpro ligands using std-nmr spectroscopy, and evaluated the relative qualitative affinities of chemical fragments and compounds designed as part of covid moonshot. although development of novel antivirals to combat covid- is still at an early stage, we hope that this information will prove valuable to groups working towards such treatments. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / references . who. coronavirus disease [available from: https://www.who.int/emergencies/diseases/novel-coronavirus- . . kucharski aj, russell tw, diamond c, liu y, edmunds j, funk s, et al. early dynamics of transmission and control of covid- : a mathematical modelling study. lancet infect dis. ; ( ): - . . wu f, zhao s, yu b, chen ym, wang w, song zg, et al. a new coronavirus associated with human respiratory disease in china. nature. ; ( ): - . . zhu n, zhang d, wang w, li x, yang b, song j, et al. a novel coronavirus from patients with pneumonia in china, . n engl j med. ; ( ): - . . bermingham a, chand ma, brown cs, aarons e, tong c, langrish c, et al. severe respiratory illness caused by a novel coronavirus, in a patient transferred to the united kingdom from the middle east, september . euro surveill. ; ( ): . . kuiken t, fouchier ra, schutten m, rimmelzwaan gf, van amerongen g, van riel d, et al. newly discovered coronavirus as the primary cause of severe acute respiratory syndrome. lancet. ; ( ): - . . zaki am, van boheemen s, bestebroer tm, osterhaus ad, fouchier ra. isolation of a novel coronavirus from a man with pneumonia in saudi arabia. n engl j med. ; ( ): - . . thiel v, ivanov ka, putics a, hertzig t, schelle b, bayer s, et al. mechanisms and enzymes involved in sars coronavirus genome expression. j gen virol. ; (pt ): - . . bredenbeek pj, pachuk cj, noten af, charite j, luytjes w, weiss sr, et al. the primary structure and expression of the second open reading frame of the polymerase gene of the coronavirus mhv-a ; a highly conserved polymerase is expressed by an efficient ribosomal frameshifting mechanism. nucleic acids res. ; ( ): - . . hilgenfeld r. from sars to mers: crystallographic studies on coronaviral proteases enable antiviral drug design. febs j. ; ( ): - . . ghosh ak, xi k, grum-tokars v, xu x, ratia k, fu w, et al. structure-based design, synthesis, and biological evaluation of peptidomimetic sars-cov clpro inhibitors. bioorg med chem lett. ; ( ): - . . verschueren kh, pumpor k, anemuller s, chen s, mesters jr, hilgenfeld r. a structural view of the inactivation of the sars coronavirus main proteinase by benzotriazole esters. chem biol. ; ( ): - . . yang h, yang m, ding y, liu y, lou z, zhou z, et al. the crystal structures of severe acute respiratory syndrome virus main protease and its complex with an inhibitor. proc natl acad sci u s a. ; ( ): - . . yang h, xie w, xue x, yang k, ma j, liang w, et al. design of wide-spectrum inhibitors targeting coronavirus main proteases. plos biol. ; ( ):e . . zhang l, lin d, sun x, curth u, drosten c, sauerhering l, et al. crystal structure of sars-cov- main protease provides a basis for design of improved alpha-ketoamide inhibitors. science. ; ( ): - . . rut w, groborz k, zhang l, sun x, zmudzinski m, pawlik b, et al. sars-cov- m(pro) inhibitors and activity-based probes for patient-sample imaging. nat chem biol. . . , achdout h, aimon a, bar-david e, barr h, ben-shmuel a, et al. covid moonshot: open science discovery of sars-cov- main protease inhibitors by combining crowdsourcing, high- throughput experiments, computational simulations, and machine learning. biorxiv. . . douangamath a, fearon d, gehrtz p, krojer t, lukacik p, owen cd, et al. crystallographic and electrophilic fragment screening of the sars-cov- main protease. nat commun. ; ( ): . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . el-baba tj, lutomski ca, kantsadi al, malla tr, john t, mikhailov v, et al. allosteric inhibition of the sars-cov- main protease: insights from mass spectrometry based assays. angew chem int edit. . . mayer m, meyer b. characterization of ligand binding by saturation transfer difference nmr spectroscopy. angew chem int ed engl. ; ( ): - . . becker w, bhattiprolu kc, gubensak n, zangger k. investigating protein-ligand interactions by solution nuclear magnetic resonance spectroscopy. chemphyschem. ; ( ): - . . walpole s, monaco s, nepravishta r, angulo j. std nmr as a technique for ligand screening and structural studies. methods in enzymology. : elsevier; . p. - . . rogala kb, dynes nj, hatzopoulos gn, yan j, pong sk, robinson cv, et al. the caenorhabditis elegans protein sas- forms large oligomeric assemblies critical for centriole formation. elife. ; :e . . hwang tl, shaka aj. water suppression that works - excitation sculpting using arbitrary wave-forms and pulsed-field gradients. journal of magnetic resonance series a. ; ( ): - . . abraham mj, murtola t, schulz r, páll s, smith jc, hess b, et al. gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. softwarex. ; : - . . lindorff-larsen k, piana s, palmo k, maragakis p, klepeis jl, dror ro, et al. improved side- chain torsion potentials for the amber ff sb protein force field. proteins. ; ( ): - . . bayly ci, cieplak p, cornell w, kollman pa. a well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the resp model. j phys chem. ; ( ): - . . bussi g, zykova-timan t, parrinello m. isothermal-isobaric molecular dynamics using stochastic velocity rescaling. j chem phys. ; ( ): . . darden t, york d, pedersen l. particle mesh ewald: an n⋅ log (n) method for ewald sums in large systems. j chem phys. ; ( ): - . . michaud-agrawal n, denning ej, woolf tb, beckstein o. mdanalysis: a toolkit for the analysis of molecular dynamics simulations. j comput chem. ; ( ): - . . gowers rj, linke m, barnoud j, reddy tje, melo mn, seyler sl, et al., editors. mdanalysis: a python package for the rapid analysis of molecular dynamics simulations. th python in science conference; ; austin, tx. . delano wl. the pymol molecular graphics system. delano scientific, san carlos, ca, usa. http://www.pymol.org. . . collaborative drug discovery database public access [available from: https://www.collaborativedrug.com/public-access/. . grum-tokars v, ratia k, begaye a, baker sc, mesecar ad. evaluating the c-like protease activity of sars-coronavirus: recommendations for standardized assays for drug discovery. virus res. ; ( ): - . . davies tg, wixted we, coyle je, griffiths-jones c, hearn k, mcmenamin r, et al. monoacidic inhibitors of the kelch-like ech-associated protein : nuclear factor erythroid -related factor (keap :nrf ) protein-protein interaction with high cell potency identified by fragment-based discovery. j med chem. ; ( ): - . . erlanson da, fesik sw, hubbard re, jahnke w, jhoti h. twenty years on: the impact of fragments on drug discovery. nat rev drug discov. ; ( ): - . .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / acknowledgements we are grateful to nick soffe for maintenance of the oxford biochemistry solution nmr facility, to claire strain-damerell, petra lukacik and martin a. walsh for advice on mpro production, to anthony aimon and frank von delft for providing the dsi-poised fragment library, to adrián garcía, nil casajuana and clàudia llinàs del torrent for advice with md analysis tools, and to leonardo pardo for providing access to high-performance computing facilities. this work was supported by philanthropic donations to the university of oxford covid- research response fund and the oxford glycobiology institute endowment. the oxford biochemistry nmr facility was supported by the wellcome trust ( /z/ /z), the engineering and physical sciences research council (ep/r / ), the wellcome institutional strategic support fund, the epa cephalosporin fund and the john fell oup research fund. this work was also supported by the “reinforcement of postdoctoral researchers - nd cycle” (mis- ), implemented by the greek state scholarships foundation (ΙΚΥ). .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure : d and std-nmr spectra of sars-cov- m pro . a) methyl regions from h nmr spectra of recombinant sars-cov- mpro. the spectrum on the left was recorded from a μm protein concentration sample in a mm nmr tube at oc using an excitation sculpting water-suppression method ( ). acquisitions with recycle delay of . sec were averaged, for a total experiment time of just over min. the spectrum on the right was recorded from a μm mpro sample in a mm nmr tube at oc, using the same pulse sequence and acquisition parameters. for both spectra, data were processed with a quadratic sine function prior to fourier transformation. protein resonances are weaker in the oc spectrum due to lower temperature and the reduced amount of sample used for acquisition in the smaller nmr tube. the position where on-resonance irradiation was applied for std spectra is indicated. b) vertically offset h std-nmr spectra from ligand x binding to mpro. the reference spectrum is in black with the x , h o and dmso h resonances indicated. the std spectrum of x in the presence of mpro is shown in red while that in the absence of mpro is in green. std spectra are scaled up x compared to the reference spectrum. bottom panels correspond to magnified views of the indicated spectral regions, with x resonances assigned to chemical groups of that ligand as shown. figure : assessment of fragment binding to m pro . a) stdratio values for chemical fragments identified by crystallographic screening as binding to mpro ( ). ligands binding to the mpro active site are coloured orange, at the mpro dimer interface in red, and elsewhere on the protein surface in blue. b) overlay of std-nmr spectra from fragments x , x and x , which bind the mpro active site, showing the ligand aromatic region in the presence of mpro. spectra are colour coded per ligand as indicated. as seen, the three fragments yield significantly different std signal intensities captured in the stdratio values shown in (a). c) overlay of reference spectra from fragments x , x and x , showing the ligand aromatic region. peak intensities vary substantially, suggesting significant differences in ligand concentration. figure . std-nmr of covid moonshot ligands binding to m pro . a) overlay of reference spectra from the indicated covid moonshot ligands, showing the ligand aromatic region in each case. in the presence of mpro. spectra are colour coded per ligand as indicated. as seen, peak intensities vary substantially, suggesting significant differences in ligand concentration. peaks of ligand edj-med- c e a - (green) are indicated by arrows; ligand edj-med-e b d - (red) produced no peaks in the nmr spectrum. b) plot of stdratio values from covid moonshot ligands assessed by std-nmr against their ic value estimated by rapidfire mass spectroscopy enzymatic assays ( ). ligands in .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / blue show weak correlation between the two methods (red line, corresponding to an exponential function along the ic dimension). ligands in grey represent outliers of the std-nmr or enzymatic method as discussed in the text. .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / x h o dmso reference std (+mpro) std (-mpro) b std irradiation a oc oc δ h (ppm) n nh nh o x δ h (ppm) δ h (ppm) .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x s td ra tio (x - ) ligand fragments x x x [ppm] . . . . . δ h (ppm) b a [ppm] . . . . . . x x x c .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ppm] . . . . . ral-tha- b ceba- lor-nor-c e ad- edj-med-c e a - edj-med-e b d - a δ h (ppm) . . r ap id fi re ic ( µ m ) stdratio (x - )b r = % .cc-by . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / distribution and diversity of dimetal-carboxylate halogenases in cyanobacteria distribution and diversity of dimetal-carboxylate halogenases in cyanobacteria nadia eusebio , adriana rego , nathaniel r. glasser , raquel castelo-branco , emily p. balskus * and pedro n. leão * interdisciplinary centre of marine and environmental research (ciimar/cimar), university of porto, matosinhos, portugal department of chemistry and chemical biology, harvard university, cambridge, ma, usa *corresponding authors, e-mail: pleao@ciimar.up.pt, balskus@chemistry.harvard.edu keywords: halogenases, cyanobacteria, natural products, biocatalysis repositories: the draft genomes generated in this study are available in the genbank under bioproject sub . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract halogenation is a recurring feature in natural products, especially those from marine organisms. the selectivity with which halogenating enzymes act on their substrates renders halogenases interesting targets for biocatalyst development. recently, cylc – the first predicted dimetal-carboxylate halogenase to be characterized – was shown to regio- and stereoselectively install a chlorine atom onto an unactivated carbon center during cylindrocyclophane biosynthesis. homologs of cylc are also found in other characterized cyanobacterial secondary metabolite biosynthetic gene clusters. due to its novelty in biological catalysis, selectivity and ability to perform c-h activation, this halogenase class is of considerable fundamental and applied interest. however, little is known regarding the diversity and distribution of these enzymes in bacteria. in this study, we used both genome mining and pcr-based screening to explore the genetic diversity and distribution of cylc homologs. while we found non-cyanobacterial homologs of these enzymes to be rare, we identified a large number of genes encoding cylc-like enzymes in publicly available cyanobacterial genomes and in our in-house culture collection of cyanobacteria. genes encoding cylc homologs are widely distributed throughout the cyanobacterial tree of life, within biosynthetic gene clusters of distinct architectures. their genomic contexts feature a variety of biosynthetic partners, including fatty-acid activation enzymes, type i or type iii polyketide synthases, dialkylresorcinol-generating enzymes, monooxygenases or rieske proteins. our study also reveals that dimetal- carboxylate halogenases are among the most abundant types of halogenating enzymes in the phylum cyanobacteria. this work will help to guide the search for new halogenating biocatalysts and natural product scaffolds. data statement: all supporting data and methods have been provided within the article or through a supplementary material file, which includes supplementary figures and supplementary tables. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction nature is a rich source of new compounds that fuel innovation in the pharmaceutical and agriculture sectors [ ]. the remarkable diversity of natural products (nps) results from a similarly diverse pool of biosynthetic enzymes [ ]. these often are highly selective and efficient, carrying out demanding reactions in aqueous media, and therefore are interesting starting points for the development of industrially-relevant biocatalysts [ ]. faster and more accessible dna sequencing technologies have enabled, in the past decade, a large number of genomics and metagenomics projects focused on the microbial world [ ]. the resulting sequence data holds immense opportunities for the discovery of new microbial enzymes and their associated nps [ ]. halogenation is a widely used and well-established reaction in synthetic and industrial chemistry [ ], which can have significant consequences for the bioactivity, bioavailability and metabolic activity of a compound [ - ]. halogenating biocatalysts are thus highly desirable for biotechnological purposes [ , ]. the mechanistic aspects of biological halogenation can also inspire the development of organometallic catalysts [ ]. nature has evolved multiple strategies to incorporate halogen atoms into small molecules [ ], as illustrated by the structural diversity of thousands of currently known halogenated nps, which include drugs and agrochemicals [ , ]. until the early ’s, haloperoxidases were the only known halogenating enzymes. research on the biosynthesis of halogenated metabolites eventually revealed a more diverse range of halogenases with different mechanisms. currently, biological halogenation is known to proceed by distinct electrophilic, nucleophilic or radical mechanisms [ ]. electrophilic halogenation is characteristic of the flavin-dependent halogenases and the heme- and vanadium-dependent haloperoxidases, which catalyze the installation of c-i, c-br or c-cl bonds onto electron-rich substrates. two families of nucleophilic halogenases are known, the halide methyltransferases and sam halogenases. both utilize s- adenosylmethionine (sam) as an electrophilic co-factor or as a co-substrate and halide anions as nucleophiles. notably, these are the only halogenases capable of generating c-f bonds. finally, radical halogenation has only been described for nonheme- iron/ -oxo-glutarate ( og)-dependent enzymes. this .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / type of halogenation allows the selective insertion of a halogen into a non-activated, aliphatic c-h bond. a recent review by agarwal et al ( ) thoroughly covers the topic of enzymatic halogenation. cyanobacteria are a rich source of halogenases among bacteria, in particular for nonheme iron/ og-dependent and flavin-dependent halogenases (fig. ). ambo and welo are cyanobacterial enzymes that belong to the nonheme iron/ og-dependent halogenase family [ - ]. ambo is an aliphatic halogenase capable of site- selectively modifying ambiguine, fischerindole and hapalindole alkaloids [ , ]. the close homolog ( % sequence identity) welo is capable of performing analogous halogenations in hapalindole-type alkaloids and it is involved in the biosynthesis of welwintindolinone [ , ]. barb and barb are also nonheme iron/ og- dependent halogenases that catalyze trichlorination of a methyl group from a leucine substrate attached to the peptidyl carrier protein bara in the biosynthesis of barbamide [ - ]. other halogenases from this enzyme family include jame, cura, and hctb. jame and cura catalyse halogenations in intermediate steps of the biosynthesis of jamaicamide and curacin a, respectively [ , ], while hctb is a fatty acid halogenase responsible for chlorination in hectochlorin assembly [ ]. apdc and mcnd are fad-dependent halogenases responsible for the modification of cyanopeptolin-type peptides (also known as ( s)-amino-( r)-hydroxy piperidone (ahp)-cyclodepsipeptides). these enzymes halogenate, respectively, anabaenopeptilides in anabaena and micropeptins in microcystis strains [ - ]. aerj is another example of a fad-dependent halogenase, which acts during aeruginosin biosynthesis in planktothrix and microcystis strains [ ]. recent efforts to characterize the biosynthesis of structurally unusual cyanobacterial natural products have uncovered a distinct class of halogenating enzymes. using a genome mining approach, nakamura et al. ( ) discovered the cylindrocyclophane biosynthetic gene cluster (bgc) in the cyanobacterium cylindrospermum licheniforme atcc [ ]. the natural paracyclophane natural products were found to be assembled from two chlorinated alkylresorcinol units [ ]. the paracyclophane macrocycle is created by forming two c-c bonds using a friedel–crafts-like alkylation reaction catalyzed by the enzyme cylk [ ] (fig. ). therefore, although many cylindrocyclophanes are not halogenated, their biosynthesis involves a halogenated intermediate [ , ], a process termed a cryptic halogenation [ ]. nakamura et al. ( ) showed that the cylc enzyme was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / responsible for regio- and stereoselectively installing a chlorine atom onto the fatty acid-derived sp carbon center of a biosynthetic intermediate that is subsequently elaborated to the key alkylresorcinol monomer (fig. ). to date, cylc is the only characterized dimetal-carboxylate halogenase (this classification is based on both biochemical evidence and similarity to other diiron-carboxylate proteins) [ ]. homologs of cylc have been found in the bgcs of the columbamides [ ], bartolosides [ ], microginin [ ], puwainaphycins/minutissamides [ ], and chlorosphaerolactylates [ ], all of which produce halogenated metabolites. cylc-type enzymes bear low sequence homology to dimetal desaturases and n-oxygenases [ ], functionalize c-h bonds in aliphatic moieties at either terminal or mid-chain positions, and are likely able to carry out gem-dichlorination (kleigrewe , leão ). the reactivity displayed by cylc and its homologs is of interest for biocatalysis, in particular because this type of carbon center activation is often inaccessible to organic synthesis [ , ]. an understanding of the molecular basis for the halogenation of different positions and for chain-length preference will also be of value for biocatalytic applications. hence, accessing novel variants of cylc enzymes will facilitate the functional characterization of this class of halogenases, mechanistic studies, and biocatalyst development. here, we provide an in-depth analysis of the diversity, distribution and context of cylc homologs in microbial genomes. using both publicly available genomes and our in-house culture collection of cyanobacteria (legecc), we report that cylc enzymes are common in cyanobacterial genomes, found in numbers comparable to those of flavin-dependent or nonheme iron/ og-dependent halogenases. we additionally show that cylc homologs are distributed throughout the cyanobacterial phylogeny and are, to a great extent, part of cryptic bgcs with diverse architectures, underlining the potential for np discovery associated with this new halogenase class. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . selected examples of halogenation reactions catalyzed by different classes of microbial enzymes, with a focus on cyanobacterial halogenases. an asterisk denotes that the enzyme has been biochemically characterized. acp – acyl carrier protein. flavin-dependent halogenases bmp * (marinomonas mediterranea mmb- ) b) n h n o oh cl n h n o oh prna* (pseudomonas fluorescens bl ) oh br br oh br oho oh oho nonheme iron/ og-dependent halogenases s o ho oh o acp s o ho cl oh o acp n h nc cl h h n h nc h h cura* (moorea producens l) welo * (hapalosiphon welwitschii utex b ) c) dimetal-carboxylate halogenasesa) cylc* (cylindrospermum licheniforme atcc ) s o acp s o acp cl mcnd (microcystis cf. wesenbergii niva-cya / ) n oh o n oh o cl brtj (synechocystis salina lege ): unknown substrate o o ho ho oh oh cl cl cl bartoloside i s o acp s o acp cl clcl cold/cole (moorea bouillonii png) clyc/clyd (sphaerospermopsis sp. lege ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / methods sequence similarity networks and genomic neighborhood diagrams sequence similarity networks (ssns) were generated using the efi-est sever, following a “sequence blast” of cylc (afv ) as input [ ], using negative log e-values of and for uniprot blast retrieval and ssn edge calculation, respectively. this ssn edge calculation cutoff was found to segregate the homologs into different ssn clusters, less stringent cutoff values resulted in a single ssn cluster. the retrieved sequences and the query sequence were then used to generate the ssns with an alignment score threshold of and a minimum length of . the networks were visualized in cytoscape (v . ). the full ssn obtained in the previous step was used to generate genomic neighborhood diagrams (gnds) using the efi-gnt tool [ ]. a neighborhood size of was used and the lower limit for co-occurrence was %. the resulting gnds were visualized in cytoscape (fig. ). cyanobacterial strains and growth conditions freshwater and marine cyanobacteria strains from blue biotechnology and ecotoxicology culture collection (legecc) (ciimar, university of porto) were grown in ml z medium [ ] or ml z ‰ sea salts (tropic marine) with vitamin b , with orbital shaking (~ rpm) under a regimen of h light ( μmol photons m- s - )/ h dark at °c. genomic dna extraction fifty milliliters of each cyanobacterial strain were centrifuged at ×g for min. the cell pellets were used for genomic dna (gdna) extraction using the purelink ® genomic dna mini kit (thermo fisher scientific®) or nzy plant/fungi gdna isolation kit (nzytech), according to the manufacturer’s instructions. primer design basic local alignment search tool (blast) searches using cylc [cylindrospermum licheniforme utex b ] as query identified related genes (for tblastn: - % amino acid identity). we discarded nucleotide .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hits with a length < and e-values < × - . the complete sequences ( cylc homolog sequences, table s ) were collected from ncbi and aligned using multiple sequence comparison by log-expectation (muscle) [ ]. phylogenetic analysis of the hits was performed using fasttree gtr with a rate of . streptomyces thioluteus aurf, encoding a distant dimetal-carboxylate protein [ ] was used as an outgroup (aj . : - ). we divided the phylogeny of cylc homologs in five groups with moderate similarity (fig. s ). the regions of higher similarity within each group were selected for degenerate primer design (table ). table . degenerate primers code sequence expected amplicon size (bp) tm (ºc) af caaaaaathgcdctyaayc - ar tgdaadccttcrtgttc bf cacaaaaahtwgctctyaayc - br gtkgtrtggwargattcatc cf aatcawctttaytgggtrgc - cr aaraartgaaarctytcrtc df aatcaaacyagygcwgc dr gtraaataytgacaagc xf atcwrgaaaccartsaaga - xr catcaaaaactttyygtarrc pcr conditions the pcr to detect cylc homologs were conducted in a final volume of µl, containing . µl of ultrapure water, . µl of × gotaq buffer (promega), . µl of mgcl , . µl of dntps, . µl of reverse and . µl of forward primer (each at µm), . µl of gotaq and . µl of cyanobacterial gdna. pcr thermocycling conditions were: denaturation for min at °c; cycles with denaturation for min at °c, primer annealing for s at different temperatures ( ºc for group a; ºc for group b; ºc for group c; ºc for group d; ºc for group x) and extension for min at °c; and final extension for min at °c. when not already available, the s rrna gene for a tested strain was amplified by pcr, using standard primers for amplification (cya f ’ cgg acg ggt gag taa cgc gtg a ’ and cya r ’ gac tac .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / wgg ggt atc taa tcc ’). the pcr reactions were conducted in a final volume of µl, containing . µl of ultrapure water, . µl of × gotaq buffer, . µl of mgcl , . µl of dntps, . µl of primer reverse and . µl of primer forward (each one at µm), . µl of gotaq and . µl of cyanobacterial dna. pcr thermocycling conditions were: denaturation for min at °c; cycles with denaturation for min at °c, primer annealing for s at ºc and extension for min at °c; and final extension for min at °c. amplicon sizes were confirmed after separation in a . % agarose gel. cloning and sequencing the cylc homolog and s rrna gene sequences were obtained either directly from the ncbi or through sequencing. to obtain high quality sequences, the topo pcr cloning (invitrogen) was used. the topo cloning reaction was conducted in a final volume of µl, containing µl of fresh pcr product, µl of salt solution, . µl of topo vector and . µl of water. the reaction was incubated for min at room temperature. three- microliters of topo reaction were added into a tube containing chemically competent e. coli (top , life technologies) cells. after min of incubation on ice, the cells were placed for s at ºc without shaking and were then immediately transferred to ice. µl of room temperature soc medium were added to the previous mixture and the tube was horizontally shaken at ºc for h ( rpm). µl of the different cloning reactions were spread onto lb ampicillin/x-gal plates and incubated overnight at ºc. two or three positive colonies from each reaction were tested by colony-pcr. the pcr was conducted in a final volume of µl, containing . µl of ultrapure water, . µl of x gotaq buffer, . µl of mgcl , . µl of dntps, . µl of reverse pucr and . µl of forward pucf primers (each at µm), . µl of gotaq and the target colony. pcr thermocycling conditions were: denaturation for min at °c; cycles with denaturation for min at °c, primer annealing for s at ºc and extension for min at °c; and final extension for min at °c. amplicon sizes were confirmed after separation in an . % agarose gel. selected colonies were incubated overnight at ºc ( rpm), in ml of lb supplemented with µg ml- ampicillin. the plasmids containing the amplified pcr products were extracted (nzyminiprep kits) and sanger sequenced using puc primers. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cyanobacteria genome sequencing many of the legecc strains are non-axenic, and so before extraction of gdna for genome sequencing, an evaluation of the amount of heterotrophic contaminant bacteria in cyanobacterial cultures was performed by plating onto z or z with added . % sea salts (tropic marine) and vitamin b ( µg/l) agar medium (depending the original environment) supplemented with casamino acids ( . % wt/vol) and glucose ( . % wt/vol) [ ]. the plates were incubated for - days at ºc in the dark and examined for bacterial growth. those cultures with minimal contamination were used for dna extraction for genome sequencing. the selection of dna extraction methodology used was based on morphological features of each strain. total genomic dna was isolated from a fresh or frozen pellet of ml culture using a ctab-chloroform/isoamyl alcohol-based protocol [ ] or using the commercial purelink genomic dna mini kit (thermo fisher scientific®) or the nzy plant/fungi gdna isolation kit (nzytech). the latter included a homogenization step (grinding cells using a mortar and pestle with liquid nitrogen) before extraction using the standard kit protocol. the quality of the gdna was evaluated in a ds- fx spectrophotometer (denovix) and % agarose gel electrophoresis, before genome sequencing, which was performed elsewhere (era , spain and microbesng, uk) using × bp paired-end libraries and the illumina platform (except for synechocystis sp. lege , whose genome was sequenced using the ion torrent pgm platform). a standard pipeline including the identification of the closest reference genomes for reading mapping using kraken [ ] and bwa-mem to check the quality of the reads [ ] was carried out, while de novo assembly was performed using spades [ ]. the genomic data obtained for each strain was treated as a metagenome. the contigs obtained as previously mentioned were analyzed using the binning tool maxbin . [ ] and checked manually in order to obtain only cyanobacterial contigs. the draft genomes were annotated using the ncbi prokaryotic genome annotation pipeline (pgap) [ ] and submitted to genbank under the bioproject number sub . in the case of hyella patelloides lege and sphaerospermopsis sp. lege the assemblies had been previously deposited in ncbi under the biosample numbers samea and samn , respectively. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genomic context of cylc homologs blastp searches using cylc [cylindrospermum licheniforme utex b ] as query identified related cylc homologs within the publicly available cyanobacterial genomes and in the genomes of legecc strains. we annotated the genomic context for each cylc homolog using antismash v . [ ] and manual annotation through blastp of selected proteins. some bgcs were not identified by antismash and were manually annotated using blastp searches. phylogenetic analysis nucleotide sequences of cylc homologs obtained from the ncbi and from genome sequencing in this study, were aligned using muscle from within the geneious r . software package (biomatters). the nucleotide sequence of the distantly-related dimetal-carboxylate protein aurf [ ] from streptomyces thioluteus (aj . : - ) was used as an outgroup. the alignments, trimmed to their core , , , and positions (for group a, b, c, d and x, respectively), were used for phylogenetic analysis, which was performed using fasttree (from within geneious), using a gtr substitution model (from jmodeltest, [ ]) with a rate of (fig. s ). for the phylogenetic analysis based on the s rrna gene (fig. , fig. s ), the corresponding nucleotide sequences were retrieved from the ncbi (from public available genomes until march , ) or from sequence data (amplicon or genome) obtained in this study. the sequences were aligned as detailed for cylc homologs and trimmed to the core shared positions ( ). a raxml-hpc phylogenetic tree inference using maximum likelihood/rapid bootstrapping run on xsede ( . . ) with bootstrap iterations in the cipres platform [ ] was performed. the amino acid sequences of cylc homologs were aligned using muscle from within the geneious software package (biomatters). the alignments were trimmed to their core residues and used for phylogenetic analysis, which was performed using raxml-hpc phylogenetic tree inference using maximum .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / likelihood/rapid bootstrapping run on xsede ( . . ) with bootstrap iterations in the cipres platform [ ] (fig. c). corason analysis corason, a bioinformatic tool that computes multi-locus phylogenies of bgcs within and across gene cluster families [ ], was used to analyze cyanobacterial genomes collected from the ncbi and the legecc genomes (table s ). in total cyanobacterial genomes recovered from ncbi and additional lege genomes were used in the analysis. the amino acid sequences of cura (aat . ), welo (ahi . ), mcnd (cci . ), bmp (wp_ . ), prna (wp_ . ) and cylc (aru . ) were used as query and, for each enzyme, a reference genome was selected (table s ). to increase the phylogenetic resolution, selected genomes were removed from the analysis of enzymes cylc, prna, cura, mcnd and bmp (table s ). additionally, for the cylc analysis, a few bgcs were manually extracted and included in the analysis (table s ) since they were not detected by corason. prevalence of halogenases in cyanobacterial genomes representative proteins of each class were used as query in each search: cylc (aru . ), brtj (akv . ), “mic” (wp_ . ) - the halogenase in the putative microginin gene cluster – cold (akq . ), cole (akq . ), noco (akl . ), nocn (akl . ) for dimetal-carboxylate halogenases; prna (wp_ . ), bmp (wp_ . ), and mcnd (cci . ) for flavin- dependent halogenases; the halogenase domains from cura (aat . ), and the halogenases barb (aan . ), hctb (aay . ), welo (ahi . ) and ambo (akp . ) for nonheme iron- dependent halogenases). non-redundant sequences obtained for these searches using a × - e-value cutoff, which represents a percentage identity between the query and target protein superior to %, were considered to share the same function as the query. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / results and discussion cylc-like halogenases are mostly found in cyanobacteria to investigate the distribution of cylc homologs encoded in microbial genomes, we first searched the reference protein (refseq) or non-redundant protein sequences (nr) databases (ncbi) for homologs of cylc or brtj, using the basic local alignment search tool, blastp (min % identity, . × - e-value and % coverage). a total of and homologous unique protein sequences were retrieved using the refseq or nr databases, respectively; in both cases, sequences were primarily from cyanobacteria ( and %, respectively) (fig. a). we then used the enzyme similarity tool of the enzyme function initiative (efi-est) [ ] to evaluate the sequence landscape of dimetal-carboxylate halogenases. using cylc as query, we obtained a ssn (sequence similarity network) composed of sequences retrieved from the uniprot database [ ] (fig. b). the ssn featured two major clusters, one containing homologs from diverse cyanobacterial genera, the other composed of homologs from several cyanobacteria, with a few from proteobacteria (mostly deltaproteobacteria) and two from the cyanobacteria sister-phylum melainabacteria. a third ssn cluster was composed only by the previously reported brtj enzymes and, finally, a homolog from the cyanobacterial genus hormoscilla remained unclustered. we were unable to recover any ssn that included clusters containing other characterized enzyme functions, which attests to the uniqueness of the dimetal-carboxylate halogenases in the current protein-sequence landscape. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . abundance of cylc homologs in bacteria. a) blastp using cylc (genbank accession no: aru ) as query against different databases, shows that these dimetal-carboxylate enzymes are found almost exclusively in cyanobacteria. b) sequence similarity network (ssn) of cylc depicting the similarity- based clustering of uniprot-derived protein sequences with homology (blast e-value cutoff × - , edge e- value cutoff × - ) to cylc (genbank accession no: aru ). in each node, the bacterial genus for the corresponding uniprot entry is shown (na – not attributed). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cylc homologs are widely distributed throughout the phylum cyanobacteria with the intent of accessing a wide diversity of cylc homolog sequences, we decided to use a degenerate-primer pcr strategy to discover additional homologs in cyanobacteria from the legecc culture collection [ ], because the phylum cyanobacteria is diverse and still underrepresented in terms of genome data [ - ]. the legecc culture collection maintains cultures isolated from diverse freshwater and marine environments, mostly in portugal, and, for example, contains all known bartoloside-producing strains [ ]. primers were designed based on nucleotide sequences retrieved from the ncbi that were selected to represent the phylogenetic diversity of cylc homologs (fig. s ). due to the lack of highly conserved nucleotide sequences among all homologs considered, we divided the nucleotide alignment into five groups and designed a degenerate primer pair for each. upon screening strains from legecc using the five primer pairs, we retrieved sequences encoding cylc homologs, confirmed through cloning and sanger sequencing of the obtained amplicons. we were unable to directly analyze the diversity of the entire set of legecc-derived cylc amplicons due to low overlap between sequences obtained with different primers. as such, we performed a phylogenetic analysis of the diversity retrieved with each primer pair (fig. s ), by aligning the pcr-derived sequences with a set of diverse cylc genes retrieved from the ncbi. for some strains, our pcr screen retrieved more than one homolog using different primer pairs (e.g. nostoc sp. lege or planktothrix mougeotii lege ). in general, and for each primer pair, the pcr screen retrieved mostly sequences that were closely related and associated to one or two phylogenetic clades. this can likely be explained by the geographical bias that might exist in the legecc culture collection [ ] and/or with primer design and pcr efficiency issues, which might have favored certain phylogenetic clades. to access full-length sequences of the cylc homologs identified among legecc strains, as well as their genomic context, we undertook a genome-sequencing effort informed by our pcr screen. we selected strains for genome sequencing, which represents the diversity of cylc homologs observed in the different pcr screening groups. the resulting genome data was used to generate a local blast database and the homologs .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / were located within the genomes. in some cases, additional homologs that were not detected in the pcr screen were identified. overall, full-length genes encoding cylc homologs were retrieved from legecc strains. to explore the phylogenetic distribution of cylc homologs encoded in publicly available reference genomes and the herein sequenced legecc genomes, we aligned the s rrna genes from strains with refseq genomes and the legecc strains that were screened by pcr in this study. using this dataset, we performed a phylogenetic analysis which indicated that cylc homologs are broadly distributed through five cyanobacterial orders: nostocales, oscillatoriales, chroococcales, synechococcales and pleurocapsales (fig. , fig. s ). it is noteworthy that the cyanobacterial orders for which we did not find cylc homologs (chroococcidiopsidales, spirulinales, gloeomargaritales and gloeobacterales) are poorly represented in our dataset (fig. , fig. s ). however, our previous blastp search against the nr database did retrieve two close homologs in two chroococcidiopsidales strains (genera aliterella and chroococcidiopsis) and a more distant homolog in a gloeobacter strain (gloeobacterales) (table s ). given the wide but punctuated presence of cylc homologs among the cyanobacterial diversity considered in this study, it is unclear how much of the current cylc homolog distribution reflects vertical inheritance or horizontal gene transfer events. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . raxml cladogram of the s rrna gene of legecc strains (grey squares) and from cyanobacterial strains with ncbi-deposited reference genomes, screened in this study. taxonomy is presented at the order level (colored rectangles). strains whose genomes encode cylc homologs are denoted by black squares. green squares indicate that at least one homolog was detected by pcr-screening and verified by retrieving the sequence of the corresponding amplicon by cloning followed by sanger sequencing. gloeobacter violaceus pcc served as an outgroup. a version of this cladogram including the bootstrap values for replications is provided as supplementary material. diversity of bgcs encoding cylc homologs to characterize the biosynthetic diversity of bgcs encoding cylc homologs, which were found in cyanobacterial genomes ( from legecc and from refseq) from different orders, we first submitted these lim n orap his rob usta c s filam ento us cyano bacter ium lege scytonem a millei vb geitler inema sp lege n os to c ed ap h ic um l e g e m icrocystis aeruginosa n ies no sto c sp c av n proc hlo ro cocc u s s p rs chlo r oglo eo ps is frit schii pcc synecho coccales cyan obacte rium leg e un id en tif ie d no st oc al es l eg e fisch erella t herm alis wc fisch erella t herm alis wc s yn ec ho cy st is s al in a le g e cu sp ido thr ix iss at sc he nk oi l e ge tycho nem a sp le g e filam ento us cyano bacter ium lege xx cyano bi um sp le ge p lanktoth rix aga rdhii c c a p a m icr ocystis aer uginosa n ies croco sphae ra sub trop ica atcc le p to ly ng b ya s p le g e ly ng by a co n fe rv oi d es b du chon drocystis sp nies acaryoch loris ma rina m bi c c yl in dr o sp er m op si s ra ci bo rs ki i s c yl in dr o sp er m op si s ra ci bo rs ki i c y lp no do sili ne a s p l eg e a na ba en a a ph an iz o m en o id es l e g e stanier ia cyano sphaer a pcc s yn ec ho co cc us s p l eg e croco sphae ra chwa kensis ccy tycho nem a sp le g e cyan o bac ter iu m pc c cu sp ido thr ix iss at sc he nk oi l e ge le p t oly ng b y a s p l eg e filam ento us cyano bacter ium lege fisch erella t herm alis wc un id en tif ie d ps eu da na ba en a ce ae c ya no ba ct er iu m l e g e no do sili ne a s p l eg e fi la m en to us c ya no ba ct er iu m l e g e synec ho coc cus sp l ege c al ot h rix p ar as iti ca n ie s fisch erella m uscicola pcc p lanktoth rix m o ugeot ii l eg e m icrocystis sp le g e no do sili ne a s p l eg e tycho nema bor netii lege pseuda nabae na af f mucicola lege m icrocystis aeruginosa le g e c yl in dr o sp er m op si s ra ci bo rs ki i s cylindro sp erm um st a gnale pcc c yl in dr o sp er m op si s ra ci bo rs ki i l eg e unid entified pseu dana baena ceae cya noba cterium lege m icrocystis aeruginosa le g e fo rti ea s p le g e x x mo orea bouillonii png nostoc az ollae o sc ill at or ia s p le g e un cu ltu re d t ol yp ot h rix s p cl on e le g e cyano bium sp le ge no do sili ne a s p l eg e calo th rix sp p cc m icrocystis aeruginosa le g e m a stigocoleu s testarum b c ca lot h r ix sp n ie s fisch erella t herm alis wc m icrocystis aeruginosa le g e fo rtie a con tor ta pcc p lanktoth rix aff m oug eotii le g e micr ocoleus sp pcc m icrocystis aeruginosa n ies nostoc lin ckia z cyano bacter ium ap oninum pcc stanier ia sp nies mo orea prod ucens jhb anaba ena sp atcc fisch erella t herm alis ccmee h al om icr on em a cf m et az oi cu m l e g e gloeo capsop sis sp h n os to c sp l e g e c aloth rix sp n ie s syn ech o co ccu s cf n id ulan s l ege s yn ec ho cy st is s al in a le g e fisch erella t herm alis wc nosto c lin c kia n ies m icrocystis aeruginosa le g e cyano bium sp lege gloeo bacter kilauee nsis js fisch erella t herm alis pcc vulc anoc o ccu s lim net ic u s l l cylindro spe rm um liche niform e ute x b apha n izo meno n flos a qu ae n i es halom icr onem a ho ngdech loris c phorm idium sp lege n os to c sp l e g e cya no b ium sp le ge s yn ec ho cy st is s al in a le g e m icrocystis aeruginosa pc c s l m icrocystis aeruginosa le g e nostoc li n ckia z cyano thece sp pcc limn othr ix rosea nies sy nec ho coc cus nid u la ns le ge p lanktoth rix pau civesiculata p c c p lanktoth rix sp p c c fisch erella t herm alis wc p lanktoth rix m o ugeot ii l eg e n os to c sp l e g e cya no b ium a ff gra cile le ge m icrocystis aeruginosa pc c a ff r oh ol tie lla s p le g e fisch erella sp pcc nostoc lin ckia z un id en tifi ed n o s toc ale s l eg e xx do lic ho sp erm u m sp l eg e m icrocystis aeruginosa le g e c yl in dr o sp er m op si s ra ci bo rs ki i l eg e tycho nem a sp le g e anaba ena sp lep tolyngb ya oha dii is lep toly ngb ya cf h a lo phi la l eg e no do sili ne a s p l eg e m icrocystis aeruginosa le g e n os to c ca rn e um n ie s le p t oly ng b y a s p l eg e n os to c sp l e g e cyano thece sp pcc le p t oly ng b y a s p l eg e cya no b ium sp le ge n os to c sp l e g e p lanktoth rix prolifica n iva c ya tycho nema sp lege r ap h id io ps is b ro ok ii d d m icrocystis viridis n ie s xenoco ccus sp pcc s yn ec ho cy st is s p le g e c yl in dr o sp er m op si s ra ci bo rs ki i c hyella pa telloides l ege nostoc sp mg no do sili ne a s p l eg e cyano thece sp pcc s cytonem a sp n ie s m icrocystis aeruginosa le g e lep tolyngb ya sp lege r iv ul ar ia s p le g e s ynecho cystis sp ip pa s b no st oc a le s cy an o ba ct er iu m l e ge micr ocoleus sp lege m icrocystis aeruginosa n ies s yn ec ho cy st is s al in a le g e rome ria sp leg e do lic ho sp erm u m sp l eg e to ly po th rix te nu is p c c cylindro sperm opsis r aciborskii l ege m icrocystis aeruginosa pc c cyano bium sp lege c yl in dr o sp er m op si s ra ci bo rs ki i s m icrocystis sp m c no sto c p isc ina le c en a af f n od os ilin e a sp l eg e fisch erella t herm alis ccmee anaba en a sp pcc doli chos p erm u m plan cton icum nie s no sto c sp n ie s cyano bium s p leg e cyano bium sp lege xx s ynecho cystis sp p c c m icr ocystis aer uginosa pc c deser tifilum sp ippas b un id en tifi ed n o s toc ale s l eg e xx m icrocystis aeruginosa le g e geitler inema sp lege sy nec ho coc cus sp l e ge no du la ri a s p l eg e tycho nem a sp le g e p lanktoth rix ru bescens strain synecho cystis sp lege un id en tif ie d fila m en t o us s yn ec ho co cc al es l eg e mo orea prod ucens pal chro ococcidiop sis sp ts s yn ec ho cy st is s al in a le g e do lic ho sp erm u m sp l eg e no do sili ne a n od ulo sa p cc p lanktoth rix m o ugeot ii l eg e doli chos p erm u m com pact um nie s no sto c sp n ie s cyan o bium sp l ege croco sphae ra watso nii w h cyano bium sp lege to xi fil um m ys id oc id a l e g e c aloth rix rhizo soleniae sc aph an iz o me no n flos a qu ae km d filam ento us cyano bacter ium lege cu sp ido thr ix sp le ge tycho nem a sp le g e a rthrospira sp tjs d s ph ae ro sp e rm op si s sp l e g e p lanktoth rix m o ugeot ii l eg e lep tolyngb ya bor yana nies le p t oly ng b y a s p l eg e a rthrospira sp o f do lic ho sp erm u m sp l eg e m icrocystis aeruginosa kw m icrocystis aeruginosa ta ih u fisch erella m ajor ni es s yn ec ho cy st is s p le g e li m n ot hr ix sp p r gloeo capsop sis crepidin um lege m icrocystis sp le g e x x m icr ocystis aer uginosa sp c nos toc sp u ic chro ococcales cyanoba cterium lege c yl in dr o sp er m op si s ra ci bo rs ki i l eg e lep tolyngb ya sp pcc phorm idium sp lege le p to ly ng b ya s p le g e oscillator ia sp pcc a naba ena cylind rica p c c pseuda nabae na sp pcc pse r ap h id io ps is c ur va ta n ie s gloeo capsa sp pcc s cytonem a tolyp othrich oides v b c yl in dr o sp er m op si s ra ci bo rs ki i g ih e s ph ae ro sp e rm op si s sp l e g e m icrocystis aeruginosa pc c s yn ec ho cy st is s al in a le g e fisch erella t herm alis wc fisch erella m usc icola pcc dactyloco ccopsis salina pcc c yl in dr o sp er m op si s ra ci bo rs ki i c cyano bium sp lege oscillator iales cyano bacter ium m tp s yn ec ho cy st is s al in a le g e a rthrospira platen sis ni es lep tolyngb ya sp pcc m icrocystis aeruginosa le g e s yn ec ho co cc al es c ya n ob ac te riu m l eg e s yn ec ho co cc al es c ya n ob ac te riu m l eg e cya no b ium sp p cc m icrocystis aeruginosa le g e coleof asciculus chth onop la stes pcc m icrocystis aeruginosa le g e no do sili ne a s p l eg e s yn ec ho co cc al es c ya n ob ac te riu m l eg e chro ococcidiop sis cubana sag a rthrospira platen sis yz a rthrospira sp tjs d fi la m en to us c ya no ba ct er iu m l e g e c aloth rix dese rtica p c c no sto c cy ca da e wk le p to ly ng b ya s p b c lep tolyngb ya sp lege fisch erella t herm alis wc syne cho c occu s sp uw no do sili ne a s p l eg e p lanktoth rix aff m oug eotii le g e m icrocystis aeruginosa pc c micr ocoleus va ginatu s fgp lep tolyngb ya sp he nsonii phorm idium sp lege m icrocystis sp le g e le p t oly ng b y a s p l eg e n os to c sp l e g e nos toc sp cyano bium s p leg e cya no b ium g rac ile l ege aphan othece sacrum fpu s yn ec ho cy st is s al in a le g e gem inocystis sp nies no do silin e a no d u los a l eg e sy ne ch o c oc ca les cy an ob ac te riu m le g e c yl in dr o sp er m op si s ra ci bo rs ki i c no sto c sp p cc nostoc lin ckia z alkalinema aff pa ntana lense l ege doli chos p erm u m circi na le aw qc f f tycho nem a sp le g e a rthrospira sp str pc c lyng bya ae stuarii b l j lae st n os to c sp l e g e s yn ec ho cy st is s al in a le g e tycho nem a sp le g e nos toc sp a literella a tlantica c e n a m icr ocystis aer uginosa pc c planktoth rix mo ugeot ii l ege no do silin e a sp l e ge c yl in dr o sp er m op si s ra ci bo rs ki i m vc c cyano b ium sp l ege tycho nema bour rellyi f em gt crina liu m epip sammu m pcc m icrocystis sp le g e c yl in dr o sp er m op si s ra ci bo rs ki i c s cyano thece sp pcc m icrocystis aeruginosa pc c s ynecho cystis sp p c c no sto c c om mu ne hk cyano bium sp lege syn ech o co ccu s sp l e ge phorm idesmis p riestleyi bc chro ococcales cyanoba cterium ippas b s cy to ne m a ho fm an n i u te x lusita niella cor iacea l ege rubidib acter la cunae kordi kr cyano biu m sp leg e m icrocystis sp le g e ana ba e na s p m icrocystis aeruginosa sj c yl in dr o sp er m op si s ra ci bo rs ki i c r cya no b ium sp le ge do lic ho sp erm u m sp l eg e fisch erella m uscicola ccmee cyano bium sp lege spirulina major pcc fisch erella t herm alis wc m icrocystis pan niform is fa ch b fisch erella sp pcc m icrocystis aeruginosa n ies no du la ri a s pu mig en a c cy geitler inema sp pcc fisch erella sp pcc gloeo capsop sis sp lege an aba ena va riab ilis at cc chro ococcidiop sid ales cyan obacte rium l ege nos toc sp a tcc fisch erella t herm alis wc m a stigoclado psis rep ens p c c un id en tif ie d fila m en t o us s yn ec ho co cc al es l eg e to ly po th rix s p ni es chro ococcidiop sid ales cyan obacte rium l ege cyano bium sp lege westiellopsis p rolifica iicb lim n orap his rob usta l eg e x x s yn ec ho cy st is s p le g e p lanktoth rix ru bescens niva cy a pseuda nabae na cf cu rta l ege m icr ocystis aer uginosa d ian c h i c yl in dr o sp er m op si s ra ci bo rs ki i c e n a un id en tif ie d fila m en t o us s yn ec ho co cc al es l eg e m icrocystis aeruginosa le g e xx cyano bium sp lege m icrocystis aeruginosa pc c fisch erella t herm alis ccmee fisch erella t herm alis br b synec ho coc cus sp l ege no do silin e a sp l e ge lyng bya sp p c c chro ococcidiop sis therm alis pcc planktoth rix mo ugeot ii l ege lep tolyngb ya sp nies m icrocystis aeruginosa le g e pleuro capsales cya noba cterium lege cand id atus atelocyan obacte rium thalassa iso late aloha n os to c sp l e g e syn ech o co cca les cya n ob acte rium le g e lep tolyngb ya bor yana d g c al ot h rix s p p c c cyano bacter ium ap oninum ippas b cyano thece sp pcc cyano thece sp bg s ph ae ro sp e rm op si s re ni fo rm is n ie s ph orm idi um t e nu e ni es r ichelia int racellular is h m fisch erella t herm alis ccmee cyano bium s p leg e cyano bium sp lege m icrocystis w esenb ergii l eg e chro ogloeo cystis sidero phila nies fisch erella t herm alis ccmee fisch erella t herm alis wc c aloth rix sp n ie s spirulina subsalsa pcc no do sili ne a s p l eg e trichodesm ium e rythraeum im s s ph ae ro sp e rm op si s ki ss el ev ia na n ie s fi la m en to us c ya no ba ct er iu m l e g e filam ento us cyano bacter ium esfc a mydraf t s yn ec ho co cc al es c ya n ob ac te riu m l eg e hapa lo siphon sp mrb croco sphae ra watso nii w h cyano bium sp leg e phorm idium sp lege cyano bium sp lege le p to ly ng b ya e ct oc a rp i l e g e do lic ho sp erm u m sp l eg e s yn ec ho cy st is s p le g e lep tolyngb ya bor yana pcc phorm idesmis p riestleyi ulc no do sili ne a s p l eg e cyano bium s p lege fisch erella t herm alis wc c aloth rix sp n ie s no do silin e a sp l e ge nos toc sp s yn ec ho cy st is s al in a le g e croco sphae ra watso nii w h no sto c f lag elli for me cc nu n a rthrospira platen sis c to lyp o t hri x s p pc c s yn ec ho co cc al es c ya n ob ac te riu m l eg e no do sili ne a s p l eg e c yl in dr o sp er m op si s ra ci bo rs ki i s no do sili ne a s p l eg e cyano bacter ium sp ippas b tycho nem a sp le g e m icr ocystis aer uginosa n ies n os to c sp l e g e do lic ho sp erm u m flo s- aq ua e l eg e chro ococcidiop sis sp lege ca lot h r ix bre vis sim a n ie s cyan o biu m u s it atum c le p t oly ng b y a s p l eg e oscillator ia nigr o viridis pcc no du la ri a s p l eg e an aba ena va riab ilis ni es croco sphae ra watso nii w h to ly po th rix s p le g e s yn ec ho cy st is s al in a le g e s yn ec ho cy st is s p le g e cyano bacter ium isolat e rgsb tolypo thrix cam pylone moides vb croco sphae ra watso nii w h m icrocystis aeruginosa le g e pseuda nabae na sp abrg no do sili ne a-l ike sp le ge le p to ly ng b ya -li ke s p le g e tycho nem a sp le g e s cytonem a sp h k le p to ly ng b ya m in u ta l e g e cu sp ido thr ix iss at sc he nk oi l e ge m icrocystis aeruginosa le g e tycho nem a sp le g e cyano bium sp lege gloeo capsop sis sp lege no do silin e a cf no dul osa le g e fisch erella sp nies s yn ec ho co cc al es c ya n ob ac te riu m l eg e le p to ly ng b ya c f e ct o ca rp i l eg e cyano bium sp lege no do silin e a sp l e ge nostoc lin ckia z acaryoch loris sp ccm ee geitler inema sp pcc chro ococcales cyanoba cterium lege phorm idium sp lege chro ococcales cyanoba cterium lege do lic ho sp erm u m sp l eg e p le ct on e m a cf r ad io su m l e g e c aloth rix sp aphan othece sacrum fpu le p t oly ng b y a s p k io st ls s ynecho cystis sp p c c acaryoch loris sp rcc rcc nostoc lin ckia z p lanktoth rix aga rdhii n iv a c y a unid entified fila ment ous cyan obacte rium l ege limn othr ix sp lege p lanktoth rix m o ugeot ii l eg e c aloth rix sp p c c sy nec ho coc cus sp l e ge phorm idium cf irrigu um lege gloeo capsop sis sp lege nosto c lin c kia z do lic ho sp erm u m sp l eg e c yl in dr o sp er m op si s ra ci bo rs ki i l eg e cya no b ium sp l ege nos toc sp n no sto c sp c en a cyano thece sp atcc chro ococcales cyanoba cterium lege le p to ly ng b ya a ff e ct oc ar pi l e g e s yn ec ho cy st is s p le g e geitler inema sp pcc lep to lyngb ya sp leg e s ph ae ro sp e rm op si s sp l e g e no sto c s pha er o ide s k utz in g en n odo siline a sp l eg e fil am en to us cy an o b ac ter ium c ct fi la m en to us c ya no ba ct er iu m l e g e no do silin e a sp l e ge lep tolyngb ya bor yana i am m m icrocystis aeruginosa le g e c yl in dr o sp er m op si s ra ci bo rs ki i c s no do sili ne a s p l eg e s yn ec ho cy st is s al in a le g e pseuda nabae na sp fisch erella sp n ies no do sili ne a s p l eg e myxo sarcina sp lege syne cho c occus nidu lans l ege nostoc lin ckia z cylin dro sp erm u m sp nies c yl in dr o sp er m op si s ra ci bo rs ki i s m icrocystis aeruginosa le g e ma stigocladu s lami nosu s uu a rthrospira platen sis str p ara ca isolate ua s w s nos toc pun ctifo rme pc c no du la ri a s pu mig en a c en a p lanktoth rix tep id a p c c p ho rm id iu m s p l e g e n os to c sp l e g e m icr ocystis aer uginosa pc c no do sili ne a s p l eg e phorm idium sp lege nosto c lin c kia z cya no b ium g rac ile l ege nos toc sp k vj fisch erella t herm alis ccmee r iv ul ar ia s p p c c gem inocystis he rdma nii pcc cham aesipho n polym orph us ccal a s ph ae ro sp e rm op si s sp l e g e s cytonem a ho fm ann ii pc c cham aesipho n minu tus pcc haloth ece sp pcc n ostoca les cyano bacter ium h t cyano b ium sp lege s yn ec ho cy st is s al in a le g e s yn ec ho co cc al es c ya n ob ac te riu m l eg e r om e ria a ff g ra ci lis l e g e m icrocystis aeruginosa n ies nostoc sp p cc c hlorogloea sp c c al a oscillator iales cyano bacter ium jsc do lic ho sp erm u m flo s- aq ua e l eg e fi la m en to us c ya no ba ct er iu m l e g e x x m icrocystis aeruginosa le g e fisch erella t herm alis wc unicellular cyanob acter iu m su m icrocystis aeruginosa le g e s yn ec ho co cc al es c ya n ob ac te riu m l eg e mo orea prod ucens l p le ct on e m a cf r ad io su m l e g e cyano biu m sp leg e nostoc lin ckia z cya no b ium sp l ege cya no b ium sp le ge c yl in dr o sp er m op si s ra ci bo rs ki i c e n a m icrocystis aeruginosa n ies m icr ocystis aer uginosa n ies m icrocystis aeruginosa le g e cyano bium sp lege tycho nem a sp le g e phorm idium la etevire ns lege cyano b ium sp l ege anab a ena sp w a no sto c sp p cc lep toly ngb ya sp le ge no du la ri a s pu mig en a u hc c ca lot h r ix sp n ie s chlo r oglo eo ps is frit schii pcc le p t oly ng b y a s p l eg e cyano bium sp lege nos toc sp d b m icrocystis aeruginosa pc c fisch erella t herm alis strain jsc cyano bium sp lege s yn ec ho co cc al es c ya n ob ac te riu m l eg e pleuro capsa sp pcc planktoth ricoides sp sr phorm idium sp lege trichorm us sp n m c no do silin e a sp l e ge c al ot h rix s p le g e pseuda nabae na bice ps pcc gloeo bacter violaceu s pcc an ab a e no ps is cir cu lar is ni es chro ococcidiop sid ales cyan obacte rium l ege h al om icr on em a ex ce nt ric um s tr l ak sh ad w e ep a rthrospira m axim a c s pleuro capsa sp pcc a ul os ira la xa n ie s o cu la te lla s p le g e le p to ly ng b ya s p le g e phorm idium sp he jo filam ento us cyano bacter ium lege c yl in dr o sp er m op si s ra ci bo rs ki i i t e p a cya no b ium sp n ies fisch erella t herm alis wc no sto c c om mu ne nie s lep tolyngb ya sp nies do lic ho sp erm u m sp l eg e sy nec ho coc cal es cya n o bac te r ium le g e m icr ocystis aer uginosa n ies c al en e m a si ng ul ar is l eg e no do silin e a sp l e ge lep tolyngb ya sp o cyano bium s p leg e oscillator ia acum inata pcc tycho nem a sp le g e cf p ho rm id es m is s p le g e n os to c sp n ie s p lanktoth rix prolifica n iva c ya un id en tif ie d co lo ni al s yn ec h oc oc ca le s l eg e no du la ri a c f h arv eya na hb u croco sphae ra watso nii w h le p to ly ng b ya s ax ic ol a l eg e chro ococcop sis sp lege unid entified oscilla toriales leg e s yn ec ho co cc al es c ya n ob ac te riu m l eg e myxo sarcina sp gi co ntig o sc ill at or ia le s cy an o ba ct er iu m l eg e nosto c lin c kia z p se ud a na ba e na s p le g e kampt onem a for mosum pcc an ab a e na m inu tis sim a ut ex b s ynecho cystis sp le g e cyano bium sp lege tycho nem a sp le g e pleuro capsales cya noba cterium lege syn ech o co cca les cya n ob acte rium le g e m icrocystis aeruginosa le g e le p to ly ng b ya s p h er on is la n d j fi la m en to us c ya no ba ct er iu m l e g e fisch erella t herm alis ccmee gloeo mar garita lit hopho ra alchichica d le p to ly ng b ya s p p c c le p t oly ng b y a s p l eg e cya no b ium g ra cile le ge geitler inema sp lege unid entified oscilla toriales leg e pseuda nabae na sp bc s cy to ne m a sp l eg e p lanktoth rix aga rdhii n iv a c y a le p t oly ng b y a s p l eg e pseuda nabae na sp pcc n os to c sp l e g e no do silin e a sp l e ge filam ento us cyano bacter ium lege nostoc lin ckia z fr e my e lla dip los iph o n ni es cyano bacter ium isolat e etsb nosto c sp pa oscillator ia sp pcc d es m o no st oc m us co ru m l eg e micr ocoleus sp lege cyan o bium sp l ege m icrocystis sp t tycho nem a sp le g e m icrocystis sp no sto c s p r f ym g tycho nema sp lege c yl in dr o sp er m op si s ra ci bo rs ki i s nos toc sp a no du la ri a s p n i es cyan o biu m g r acile pcc mi cr oc ys tis ae r u gin os a le ge colored ranges nostocales oscillatoriales chroococcales synechococcales pleurocapsales chroococcidiopsidales spirulinales gloeomargaritales gloeobacterales cylc homologs cylc homologs identified by screen legecc strains .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genome sequences for antismash [ ] analysis. cylc-encoding bgcs were detected, which were classified as resorcinol, nrps, pks, or hybrid nrps-pks. given the number of cylc homolog-encoding genes detected in these genomes ( ), we considered that several bgcs might have not been identified with antismash. therefore, we performed manual annotation of the genomic contexts of the cylc homologs and were able to identify additional bgcs. upon analysis of the entire set of cylc-encoding bgcs, we classified the bgcs in seven major categories, based on their overall architecture, which we designated as follows (listed in decreasing abundance): rieske-containing (n = ), type i pks (chlorosphaerolactylate/columbamide/microginin/puwainaphycin-like, n = ), type iii pks (n = ), dialkylresorcinol (n = ), pria-containing (n = ), nitronate monooxygenase-containing (n = ) and cytochrome p /sulfotransferase-containing (n = ) (fig. a, figs. s -s ). three bgcs were excluded from our classification since they were only partially sequenced (fig. s ). examples of each of the cluster architectures are presented in fig. a and schematic representations of each of the classified bgcs are presented in supplementary figures s -s . it should be stressed that within several of these seven major categories, there is still considerable bgc architecture diversity, notably within the dialkylresorcinol, type i and type iii pks bgcs. rieske-containing bgcs are not associated with any known np and encode between two and four proteins with rieske domains. most contain a sterol desaturase family protein, feature a single cylc homolog and are chiefly found among nostocales and oscillatoriales (fig. s ). pria-containing bgcs encode, apart from the primosomal protein n' (pria), a set of additional diguanylate cyclase/phosphodiesterase, aromatic ring- hydroxylating dioxygenase subunit alpha and a ferritin-like protein and were only detected in synechocystis spp. (fig. s ). these are similar to the rieske-containing bgcs; however, in strains harboring pria-containing bgcs, the additional functionalities that are found in the rieske-containing bgcs can be found dispersed throughout the genome (table s ). in our dataset, a single sulfotransferase/p containing bgc was detected in stanieria sp. and was unrelated to the above-mentioned architectures (fig. s ). type i pks bgcs encode clusters similar to those of the chlorosphaerolactylates, columbamides, microginins and puwainaphycins and typically feature a fatty acyl-amp ligase (faal) and an acyl carrier protein upstream of one or two cylc homologs and a type i pks downstream of the cylc homolog(s). these were found in nostocales and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / oscillatoriales strains (fig. s ). taken together with the known np structures associated with these bgcs [ , , ], we can expect that the encoded metabolites feature halogenated fatty acids in terminal or mid-chain positions. bgcs of the dialkylresorcinol type, which contain dara and darb homologs (bode , leão ), including several bartoloside-like clusters (found only in legecc strains), were detected in nostocales, pleurocapsales and chroococcales (fig. s ). type iii pks bgcs encoding cylc homologs, which include a variety of cyclophane bgcs, were detected in the nostocales, oscillatoriales and pleurocapsales (fig. s ). finally, nitronate monooxygenase-containing bgcs, which are not associated with any known np, were only found in nostocales strains from the legecc and featured also genes encoding pksi, ferredoxin, acp or glycosyl transferase (fig. s ). a less bgc-centric perspective of the genomic context of cylc homologs could be obtained through the genome neighborhood tool of the efi (efi-gnt, [ ]). using the previously generated ssn as input, we analyzed the resulting genomic neighborhood diagrams (fig. b), which indicated that the three ssn clusters had entirely different genomic contexts (herein defined as upstream and downstream genes from the cylc homolog). the ssn cluster that encompasses cylc and its closest homologs indicates that these enzymes associate most often with pp-binding (acp/pcps) and amp-binding (such as faals) proteins. regarding the ssn cluster that includes both cyanobacterial and non-cyanobacterial cylc homologs, their genomic contexts most prominently feature rieske/[ fe- s] cluster proteins as well as fatty acid hydroxylase family enzymes. the cyanobacterial homologs are exclusively encoded in the rieske and pria-containing bgcs. homologs from this particular ssn cluster may not require a phosphopantetheine tethered substratei+ as no substrate activation or carrier proteins/domains were found in their genomic neighborhoods, or may act on central fatty acid metabolism intermediates. the brtj ssn cluster, composed only of the two reported brtj enzymes, shows entirely different surrounding genes, obviously corresponding to the brt genes. also noteworthy is the considerable number of proteins with unknown function found in the vicinity of dimetal-carboxylate halogenases, suggesting that uncharted biochemistry is associated with these enzymes. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / since ssn analysis generated only three clusters of cylc homologs, we next investigated the genetic relatedness among these enzymes and how it correlates to bgc architecture. we performed a phylogenetic analysis of the cylc homologs from the classified and unclassified bgcs (fig. c). our analysis indicated that pria- containing and rieske-containing bgcs formed a well-supported clade. its sister clade contained homologs from the remaining bgcs. within this larger clade, homologs associated with the type i pks, dialkylresorcinol or type iii pks bgcs were found to be polyphyletic. in some cases, the same bgc contained distantly related cylc homologs (e.g. hyella patelloides lege , anabaena cylindrica pcc ) (figure c). this analysis also revealed that several strains (fig. c) encode two or three phylogenetically distant cylc homologs in different bgcs. overall, our data shows that cylc homologs have evolved to interact with different partner enzymes to generate chemical diversity, but that their phylogeny is, in some cases, not entirely consistent with bgc architecture. these observations suggest that functionally convergent associations between cylc homologs and other proteins have emerged multiple times during evolution. examples include the cylc/cylk and brtj/brtb associations, which use cryptic halogenation to achieve c-c and c-o bond formation, respectively [ , ]. however, the role of the cylc homolog-mediated halogenation of fatty acyl moieties observed for other cyanobacterial metabolites is not currently understood. interestingly, while a number of cylc homologs, including those that are part of characterized bgcs, likely act on acp-tethered fatty acyl substrates [ , ], those from the pria- rieske- and cytochrome p /sulfotransferase categories do not have a neighboring carrier protein and therefore might not require a tethered substrate. this would be an important property for a cylc- like biocatalyst [ ]. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . diversity and genomic context of cylc-like enzymes bgcs. a) examples of the different bgcs architectures found among the clusters encoding cylc homologs. b) genome neighborhood diagram (gnd) depicting the pfam domains associated with each cluster from the initial ssn of cylc homologs. the size of each node is proportional to the prevalence of the pfam domain within the genomic context of the cylc colored ranges: nitronate monooxygenase-containing pria-containing rieske-containing pksiii dialkylresorcinol chlorsphaerolactylate/columbamides/ microginin/puwainaphycin-like sulfotransferase/p containing others c) a) b) le g e c lu st er p c c c av n n ie s c lu st er a y c a vi n n ies n ie s c lu st er c lus ter pcc cluster ut ex b niv a c ya c lus ter lege pcc lege hki aurf le g e n iv a c y a nie s pc c retsul c c c p fac hb- lege cluster pcc leg e n ie s pcc plasmidcluster lege c luster n ie s c luster pcc c c a p lege cluster lege cluster c e n a pcc le g e lege cluster u ic retsul c g n p lege pa l - - - lege cluste r h t - p c c c lu st er le g e c luster n iv a c y a c luster pcc pcc clus ter leg e pcc c luster lege ippas b lege ut ex b nies p c c n ie s c luster pcc le g e c luster lege nies n ie s png cluster le g e c luster n ie s c luster ni es p c c lege pc c lege cluster le ge lege cluster n nies cluster hbu c c n u n ni es lege jh b leg e clu ster pc c c lu st er pc c le g e c lu st er p c c nie s h k nies p c c c luster ni es leg e p c c pn g clu ster le g e c luster lege cluster pcc lege cluster le g e c lu st er lege ccap nie s pc c c luster p c c le ge pcc nonetubc_nsbbpftsx pp-binding amp-binding none duf fa_ hydroxylase gh rieskehexapepfer duf ftsx glycos_transf_ udpgt abc_tran none duf hlyd_d biotin_ lipoyl_ -hlyd_ d beta_ helix acp_syn_iii_c cluster (brtj) (n = ) cluster (n = ) cluster (n = ) abc_tran abc transporter acp_syn_iii_c -oxoacyl-[acyl-carrier-protein (acp)] synthase iii c terminal amp-binding amp-binding enzyme beta_helix right handed beta helix region biotin_lipoyl_ -hlyd_d biotin-lipoyl like-barrel-sandwich domain of cusb or hlyd membrane-fusion duf domain of unknown function (duf ) beta-propeller duf protein of unknown function (duf ) duf protein of unknown function (duf ) fa_hydroxylase fatty acid hydroxylase superfamily fer fe- s iron-sulfur cluster binding domain ftsx ftsx-like permease family gh gh auxin-responsive promoter glycos_transf_ glycosyl transferases group hexapep bacterial transferase hexapeptide (six repeats) hlyd_d barrel-sandwich domain of cusb or hlyd membrane-fusion pp-binding phosphopantetheine attachment site rieske rieske [ fe- s] domain sbbp beta-propeller repeat tubc_n tubc n-terminal docking domain udpgt udp-glucoronosyl and udp-glucosyl transferase pfam description pria-containing (synechocystis sp. pcc ) unknown product rieske-containing (calothrix brevissima nies- ) unknown product type iii pks (cylindrospermum licheniforme utex b ) cylindrocyclophanes dialkylresorcinol (synechocystis salina lege ) bartolosides type i pks (chlorosphaerolactylates/columbamides/microginin/ puwainaphycin-like) (moorea bouillonii png - ) columbamides nitronate monooxygenase-containing (nostoc sp. lege ) unknown product sulfotransferase/p -containing (stranieria sp. nies- ) unknown product pria other biosynthetic hypothetical/unknown transport/regulatory rieske other type i pks dimetal-carboxylate halogenase fatty acyl-amp ligase cylk homolog dar formation type iii pks nrps nitronate monooxygenase acyl carrier protein sulfotransferase cytochrome p kb proposed functions: .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / homologs from each ssn cluster. c) raxml cladogram ( replicates, shown are bootstrap values > %) of cylc homologs. the different colors represent a categorization based on common genes found within the associated biosynthetic gene clusters (see legend). circles of the same color depict cylc homologs encoded by the same bgc. aurf (streptomyces thioluteus hki- ) was used as an outgroup. cylc enzymes and other cyanobacterial halogenases we sought to understand how cylc-type halogenases compare to other halogenating enzyme classes found in cyanobacteria in terms of prevalence and association with bgcs. to this end, we carried out a corason [ ] analysis of publicly available cyanobacterial genomes (including non-reference genomes) and the herein acquired genome data from legecc strains (a total of , cyanobacterial genomes). we used different cyanobacterial halogenases as input, namely cylc, mcnd, prna, bmp , the og-fe(ii) oxygenase domains from cura and barb . corason attempts to retrieve genome context by exploring gene cluster diversity linked to enzyme phylogenies [ ]. the corason analysis retrieved ( . %) dimetal-carboxylate halogenases, ( . %) nonheme iron-dependent halogenases and ( . %) flavin dependent halogenases from the cyanobacterial genomes (fig. a). using the protein homologs detected in bgcs by corason, a sequence alignment was performed for dimetal-carboxylate, nonheme iron/ og-dependent and flavin- dependent halogenases. for nonheme iron/ og-dependent halogenases, we excised the halogenase domain from multi-domain enzyme sequences. after removing repeated sequences and trimming the alignments to their core shared positions, maximum-likelihood phylogenetic trees were constructed for each halogenase class and bgcs were annotated manually (figs. s -s ). flavin-dependent halogenases were commonly associated with cyanopeptolin, , -dibromophenol and pyrrolnitrin bgcs and with orphan bgcs of distinct architectures (fig. s ). regarding nonheme iron/ og-dependent halogenases, we identified barbamide, curacin, hectochlorin and terpene/indole [ ] bgcs and several distinct orphan bgcs (fig. s ). for dimetal-carboxylate halogenases, columbamide, microginin, chlorosphaerolactylate, bartoloside and cyclophane bgcs were identified (fig. s ). however, while some of the cylc homolog-encoding orphan bgcs previously identified by antismash and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / manual searches were detected by corason, the rieske- and the pria-containing bgcs were not. hence, several cylc homologs were not accounted for in this analysis. for the same reasons, the other two halogenase types could also be missing some of its members in the corason-derived datasets. to circumvent this limitation and obtain a more comprehensive picture of the abundance of the three types of halogenase in cyanobacterial genomes, we used blastp searches against available cyanobacterial genomes in the ncbi database (including non-reference genomes). several representatives of each halogenase class were used as query in each search (cylc, brtj, “mic” – the halogenase in the putative microginin gene cluster – cold, cole, noco and nocn for dimetal-carboxylate halogenases; prna, bmp and mcnd for flavin dependent halogenases; the halogenase domain from cura and the halogenases barb , hctb, welo and ambo for nonheme iron- dependent halogenases). non-redundant sequences obtained for these searches using a × - e-value cutoff (corresponding to > % sequence identity) were considered to share the same function as the query. it is worth mentioning that, for nonheme iron/ og-dependent enzymes, a single amino acid difference can convert hydroxylation activity into halogenation [ ], so it is possible that – at least for this class – the sequence space considered does not correspond exclusively to halogenation activity. dimetal-carboxylate and flavin-dependent halogenase homologs were found to be the most abundant in cyanobacteria, each with roughly . homologs per genome, while nonheme iron/ og-dependent halogenase homologs are less common (~ . per genome) (fig. b). overall, our analyses indicate that homologs of each of the three halogenase classes are associated with a large number of orphan bgcs and represent opportunities for np discovery. particularly noteworthy, cylc-like enzymes are clearly a major group of halogenases in cyanobacteria, despite having been the latest to be discovered [ ]. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . prevalence of cyanobacterial halogenases. frequency of halogenases in cyanobacteria from corason analysis (a) and ncbi blastp analysis (b). (a) dimetal-carboxylate halogenases: cylc - ncbi reference genomes, n = and legecc genomes, n = cylc-containing bgcs and genomes; flavin- dependent halogenases: prna - ncbi reference genomes, n = and legecc genomes, n = genomes; bmp - ncbi reference genomes, n = and legecc genomes, n = genomes; mcnd: ncbi reference genomes, n = and legecc genomes, n = genomes); nonheme iron/ og-dependent halogenases: halogenase domain from cura - ncbi reference genomes, n = and legecc genomes, n = genomes. (b) average of the total number of homologs per dimetal-carboxylate halogenases (cylc, brtj, “mic”, cold, cole, noco, nocn), flavin-dependent halogenases (tryptophan -halogenase prna, bmp and mcnd) and % o f h al og en as es (c o r a s o n ) n um be r of h om ol og s (b la s t) a) b) di me tal no n− he me iro n fla vin -de pe nd en t di me tal no n− he me iro n fla vin -de pe nd en t .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / nonheme iron/ og-dependent halogenases (barb , hctb, welo , ambo and the halogenase domain from cura). conclusion the discovery of a new biosynthetic enzyme class brings with it tremendous possibilities for biochemistry and catalysis research, both fundamental and applied. their functional characterization can also be used as a handle to identify and deorphanize bgcs that encode their homologs. cylc typifies an unprecedented halogenase class, which is almost exclusively found in cyanobacteria. by searching cylc homologs in both public databases and our in-house culture collection, we report here more than new cyanobacterial cylc homologs. we found that dimetal-carboxylate halogenases are widely distributed throughout the phylum. the genomic neighborhoods of these halogenases are diverse and we identify a number of different bgc architectures associated with either one or two cylc homologs that can serve as starting points for the discovery of new np scaffolds. in addition, the herein reported diversity and biosynthetic contexts of these enzymes will serve as a roadmap to further explore their biocatalysis-relevant activities. finally, bartoloside-like bgcs and a cylc- associated bgc architecture (nitronate monooxygenase-containing) were found only in the legecc, reinforcing the importance of geographically focused strain isolation and maintenance efforts for the cyanobacteria phylum. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / conflicts of interest the authors declare that there are no conflicts of interest. funding information this work was funded by fundação para a ciência e a tecnologia (fct) through grant ptdc/bia- bqm/ / to pnl and through strategic funding uid/multi/ / and by the national science foundation (nsf) through grant career- to epb. ar and rcb are supported by doctoral grants from fct (sfrh/bd/ / and sfrh/bd/ / , respectively). this material is based upon work supported by an nsf postdoctoral research fellowship in biology (grant no to nrg). any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the nsf. acknowledgments we thank hitomi nakamura, samantha cassell, diana sousa and joão reis for technical assistance during this study, and the blue biotechnology and ecotoxicology culture collection (legecc) for the genomic dna used for the pcr screening. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . pham jv, yilma ma, feliz a, majid mt, maffetone n et al. a review of the microbial production of bioactive natural products and biologics. front microbiol ; ( ). . noda-garcia l, tawfik ds. enzyme evolution in natural products biosynthesis: target- or diversity- oriented? curr opin chem biol ; : - . . giani am, gallo gr, gianfranceschi l, formenti g. long walk to genomics: history and current approaches to genome sequencing and assembly. comput struct biotechnol j ; : - . . zhang mm, qiao y, ang el, zhao h. using natural products for drug discovery: the impact of the genomics era. expert opin drug discov ; ( ): - . . gkotsi ds, dhaliwal j, mclachlan mmw, mulholand kr, goss rjm. halogenases: powerful tools for biocatalysis (mechanisms applications and scope). curr opin chem biol ; : - . . agarwal v, miles zd, winter jm, eustáquio as, el gamal aa et al. enzymatic halogenation and dehalogenation reactions: pervasive and mechanistically diverse. chem rev ; ( ): - . . weichold v, milbredt d, van pée k-h. specific enzymatic halogenation—from the discovery of halogenated enzymes to their applications in vitro and in vivo. angew chem int ed ; ( ): - . . schnepel c, sewald n. enzymatic halogenation: a timely strategy for regioselective c−h activation. chem eur j ; ( ): - . . petrone da, ye j, lautens m. modern transition-metal-catalyzed carbon–halogen bond formation. chem rev ; ( ): - . . jeschke p. the unique role of halogen substituents in the design of modern agrochemicals. pest manag sci ; ( ): - . . xu z, yang z, liu y, lu y, chen k et al. halogen bond: its role beyond drug–target binding affinity for drug discovery and development. j chem inf model ; ( ): - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . hillwig ml, zhu q, ittiamornkul k, liu x. discovery of a promiscuous non-heme iron halogenase in ambiguine alkaloid biogenesis: implication for an evolvable enzyme family for late-stage halogenation of aliphatic carbons in small molecules. angew chem int ed ; ( ): - . . liu x. in vitro analysis of cyanobacterial nonheme iron-dependent aliphatic halogenases welo and ambo . methods enzymol ; : - . . pratter sm, ivkovic j, birner-gruenberger r, breinbauer r, zangger k et al. more than just a halogenase: modification of fatty acyl moieties by a trifunctional metal enzyme. chembiochem ; ( ): - . . hillwig ml, liu x. a new family of iron-dependent halogenases acts on freestanding substrates. nat chem biol ; ( ): - . . chang z, flatt p, gerwick wh, nguyen va, willis cl et al. the barbamide biosynthetic gene cluster: a novel marine cyanobacterial system of mixed polyketide synthase (pks)-non-ribosomal peptide synthetase (nrps) origin involving an unusual trichloroleucyl starter unit. gene ; ( - ): - . . flatt pm, o'connell sj, mcphail kl, zeller g, willis cl et al. characterization of the initial enzymatic steps of barbamide biosynthesis. j nat prod ; ( ): - . . galonić dp, vaillancourt fh, walsh ct. halogenation of unactivated carbon centers in natural product biosynthesis: trichlorination of leucine during barbamide biosynthesis. j am chem soc ; ( ): - . . chang z, sitachitta n, rossi jv, roberts ma, flatt pm et al. biosynthetic pathway and gene cluster analysis of curacin a, an antitubulin natural product from the tropical marine cyanobacterium lyngbya majuscula. j nat prod ; ( ): - . . edwards dj, marquez bl, nogle lm, mcphail k, goeger de et al. structure and biosynthesis of the jamaicamides, new mixed polyketide-peptide neurotoxins from the marine cyanobacterium lyngbya majuscula. chem biol ; ( ): - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . ramaswamy av, sorrels cm, gerwick wh. cloning and biochemical characterization of the hectochlorin biosynthetic gene cluster from the marine cyanobacterium lyngbya majuscula. j nat prod ; ( ): - . . kocher s, resch s, kessenbrock t, schrapp l, ehrmann m et al. from dolastatin to cyanopeptolins, micropeptins, and lyngbyastatins: the chemical biology of ahp-cyclodepsipeptides. nat prod rep ; ( ): - . . rouhiainen l, paulin l, suomalainen s, hyytiainen h, buikema w et al. genes encoding synthetases of cyclic depsipeptides, anabaenopeptilides, in anabaena strain . mol microbiol ; ( ): - . . cadel-six s, dauga c, castets am, rippka r, bouchier c et al. halogenase genes in nonribosomal peptide synthetase gene clusters of microcystis (cyanobacteria): sporadic distribution and evolution. mol biol evol ; ( ): - . . nishizawa t, ueda a, nakano t, nishizawa a, miura t et al. characterization of the locus of genes encoding enzymes producing heptadepsipeptide micropeptin in the unicellular cyanobacterium microcystis. j biochem ; ( ): - . . nakamura h, hamer ha, sirasani g, balskus ep. cylindrocyclophane biosynthesis involves functionalization of an unactivated carbon center. j am chem soc ; ( ): - . . nakamura h, schultz ee, balskus ep. a new strategy for aromatic ring alkylation in cylindrocyclophane biosynthesis. nat chem biol ; ( ): - . . vaillancourt fh, yeh e, vosburg da, o'connor se, walsh ct. cryptic chlorination by a non- haem iron enzyme during cyclopropyl amino acid biosynthesis. nature ; ( ): - . . kleigrewe k, almaliti j, tian iy, kinnel rb, korobeynikov a et al. combining mass spectrometric metabolic profiling with genomic analysis: a powerful approach for discovering natural products from cyanobacteria. j nat prod ; ( ): - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . leão pn, nakamura h, costa m, pereira ar, martins r et al. biosynthesis-assisted structural elucidation of the bartolosides, chlorinated aromatic glycolipids from cyanobacteria. angew chem int ed ; ( ): - . . mareš j, hájek j, urajová p, kust a, jokela j et al. alternative biosynthetic starter units enhance the structural diversity of cyanobacterial lipopeptides. appl environ microbiol ; ( ):e - . . abt k, castelo-branco r, leao pnc. biosynthesis of chlorinated lactylates in sphaerospermopsis sp. lege . chemrxiv . preprint. https://doi.org/ . /chemrxiv. .v . latham j, brandenburger e, shepherd sa, menon brk, micklefield j. development of halogenase enzymes for use in synthesis. chem rev ; ( ): - . . zallot r, oberg n, gerlt ja. the efi web resource for genomic enzymology tools: leveraging protein, genome, and metagenome databases to discover novel enzymes and metabolic pathways. biochemistry ; ( ): - . . kotai j. instructions for preparation of modified nutrient solution z for algae. norwegian institute for water res ; : . . edgar rc. muscle: multiple sequence alignment with high accuracy and high throughput. nucleic acids res ; ( ): - . . rippka r, waterbury jb, stanier ry. isolation and purification of cyanobacteria: some general principles. in: starr mp, stolp h, trüper hg, balows a, schlegel hg (editors). the prokaryotes: a handbook on habitats, isolation, and identification of bacteria. berlin, heidelberg: springer berlin heidelberg; . pp. - . . singh sp, rastogi rp, häder d-p, sinha rp. an improved method for genomic dna extraction from cyanobacteria. world j microbiol biotechnol ; ( ): - . . wood de, salzberg sl. kraken: ultrafast metagenomic sequence classification using exact alignments. genome biol ; ( ):r . . li h, durbin r. fast and accurate short read alignment with burrows–wheeler transform. bioinformatics ; ( ): - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . bankevich a, nurk s, antipov d, gurevich aa, dvorkin m et al. spades: a new genome assembly algorithm and its applications to single-cell sequencing. j comput biol ; ( ): - . . wu yw, simmons ba, singer sw. maxbin . : an automated binning algorithm to recover genomes from multiple metagenomic datasets. bioinformatics ; ( ): - . . tatusova t, dicuccio m, badretdin a, chetvernin v, nawrocki ep et al. ncbi prokaryotic genome annotation pipeline. nucleic acids res ; ( ): - . . blin k, shaw s, steinke k, villebro r, ziemert n et al. antismash . : updates to the secondary metabolite genome mining pipeline. nucleic acids res ; (w ):w -w . . posada d. jmodeltest: phylogenetic model averaging. mol biol evol ; ( ): - . . miller ma, pfeiffer w, schwartz t, editors. creating the cipres science gateway for inference of large phylogenetic trees. gateway computing environments workshop (gce); - nov. . . navarro-muñoz jc, selem-mojica n, mullowney mw, kautsar sa, tryon jh et al. a computational framework to explore large-scale biosynthetic diversity. nat chem biol ; ( ): - . . the uniprot consortium. uniprot: the universal protein knowledgebase. nucleic acids res ; (d ):d -d . . ramos v, morais j, castelo-branco r, pinheiro Â, martins j et al. cyanobacterial diversity held in microbial biological resource centers as a biotechnological asset: the case study of the newly established lege culture collection. j appl phycol ; ( ): - . . dittmann e, gugger m, sivonen k, fewer dp. natural product biosynthetic diversity and comparative genomics of the cyanobacteria. trends microbiol ; ( ): - . . d'agostino pm, woodhouse jn, makower ak, yeung ac, ongley se et al. advances in genomics, transcriptomics and proteomics of toxin-producing cyanobacteria. environ microbiol rep ; ( ): - . . calteau a, fewer dp, latifi a, coursin t, laurent t et al. phylum-wide comparative genomics unravel the diversity of secondary metabolism in cyanobacteria. bmc genomics ; ( ): . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . baran r, ivanova nn, jose n, garcia-pichel f, kyrpides nc et al. functional genomics of novel secondary metabolites from diverse cyanobacteria using untargeted metabolomics. mar drugs ; ( ): - . . alvarenga do, fiore mf, varani am. a metagenomic approach to cyanobacterial genomics. front microbiol ; : - . . beck c, knoop h, axmann im, steuer r. the diversity of cyanobacterial metabolism: genome analysis of multiple phototrophic microorganisms. bmc genomics ; ( ): . . okino t, matsuda h, murakami m, yamaguchi k. microginin, an angiotensin-converting enzyme inhibitor from the blue-green alga microcystis aeruginosa. tetrahedron lett ; ( ): - . . voráčová k, hájek j, mareš j, urajová p, kuzma m et al. the cyanobacterial metabolite nocuolin a is a natural oxadiazine that triggers apoptosis in human cancer cells. plos one ; ( ):e . . zallot r, oberg no, gerlt ja. ‘democratized’ genomic enzymology web tools for functional assignment. curr opin chem biol ; : - . . reis jpa, figueiredo sac, sousa ml, leão pn. brtb is an o-alkylating enzyme that generates fatty acid-bartoloside esters. nat commun ; ( ): - . . liu y, klet rc, hupp jt, farha o. probing the correlations between the defects in metal-organic frameworks and their catalytic activity by an epoxide ring-opening reaction. chem commun (camb) ; ( ): - . . mitchell aj, dunham np, bergman ja, wang b, zhu q et al. structure-guided reprogramming of a hydroxylase to halogenate its small molecule substrate. biochemistry ; ( ): - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / thermal proteome profiling reveals distinct target selectivity for differentially oxidized oxysterols thermal proteome profiling reveals distinct target selectivity for differentially oxidized oxysterols cecilia rossetti, luca laraia # department of chemistry, technical university of denmark, kemitorvet , , kgs. lyngby, denmark. #correspondence to luclar@kemi.dtu.dk abstract oxysterols are produced physiologically by many species, however their distinct roles in regulating human (patho)physiology have not been studied systematically. the role of differing oxidation states and sites in mediating their biological functions is also unclear. as individual oxysterols have been associated with atherosclerosis, neurodegeneration and cancer, a better understanding of their protein targets would be highly valuable. to address this, we profiled three a- and b-ring oxidized sterols as well as -hydroxycholesterol using thermal proteome profiling (tpp), validating selected targets with the cellular thermal shift assay (cetsa) and isothermal dose response fingerprinting (itdrf). this revealed that the site of oxidation has a profound impact on target selectivity, with each oxysterol possessing an almost unique set of target proteins. however, overall targets clustered in pathways relating to vesicular transport and lipid metabolism and trafficking, suggesting that while individual oxysterols bind to a unique set of proteins, the processes they modulate are highly interconnected. introduction dysregulation of cholesterol homeostasis is a severe condition leading to inadequate or excessive tissue cholesterol levels. hypercholesterolemia has been identified as a common risk factor of diverse disorders, including breast, colorectal, prostatic and testicular cancer[ ] together with coronary, artery and alzheimer's diseases.[ ],[ ] oxidative metabolites of cholesterol, termed oxysterols, contribute to the regulation of cholesterol homeostasis with different transcriptional and non-genomic mechanisms, which are still incompletely understood.[ ],[ ],[ ] additionally, recent research suggests that they may play distinct roles not directly connected to the regulation of cholesterol homeostasis, including mediating membrane contact sites and trafficking. evidence has also associated increased oxysterol levels to cancer progression, the mechanisms of which remain to be elucidated.[ ] of the over twenty oxysterols identified, side-chain oxidized sterols and particularly - hydroxycholesterol ( -hc) have been the most widely studied. they have been shown to modulate the activity of cholesterol transport proteins and transcription factors involved in regulating cholesterol homeostasis. however, a- and b-ring oxidized sterols have been less well studied, in particular in relation to their target profile. those oxidized at the c position, such as -ketocholesterol ( -kc), are most frequently detected at high levels in atherosclerotic plaques[ ] and in the plasma of patients with high cardiovascular risk factors.[ ] furthermore, -kc displays toxicity at higher concentrations, accompanied by a pronounced effect on lysosomal activity.[ ] the precise mechanisms by which this occurs are still unknown. for oxysterols oxidized at -, - and -positions virtually no targets have been annotated, with the partial exception of the liver x receptor (lxr). crucially, the effect on (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:luclar@kemi.dtu.dk https://doi.org/ . / . . . biological activity of different oxidative modifications on the sterol backbone has not been explored. for all of the reasons above, the systematic discovery of oxysterol target proteins will be of profound importance in determining their (patho)physiological roles . herein we describe the systematic identification of oxysterol target proteins using thermal proteome profiling (tpp). the oxidation site and state significantly affected the target profile for each oxysterol tested, with only two proteins identified as targets for more than one oxysterol. of these, the vacuolar protein sorting associated protein (vps ) was validated more comprehensively as a protein that binds oxysterols. though different, most oxysterol targets clustered in pathways and processes related to vesicular transport as well as lipid metabolism and transfer, and most targets were localized at intracellular membranes. these results suggest specific but different roles for individual oxysterols and provide a blueprint for further studies on these important metabolites. results and discussion identification of oxysterol target proteins using thermal proteome profiling to identify potential oxysterol target proteins, tpp was selected as the method of choice [ ][ ] (figure a). this method is advantageous over other target identification methods as it does not require pre-functionalization, immobilization or modification of the compound of interest and has been shown to offer excellent proteome coverage. furthermore, the use of selected detergents including np- , has successfully enabled the identification of a large proportion of membrane proteins, which is particularly relevant as this is where a large proportion of known sterol targets are located.[ ] we opted to carry out experiments in cell lysates for the primary screening efforts, for increased reproducibility[ ],[ ] and data interpretation simplicity.[ ] the use of cell lysates enables the evaluation of direct oxysterol target engagement without additional sources of variability deriving from factors such as membrane transport, accumulation and cell metabolism, which are prominent in experiments with intact cells. we selected β-hydroxycholesterol ( β-hc), cholestane- β, α, β-triol (ct) and -ketocholesterol ( -kc) as representative a/b-ring oxidized sterols which cover all four known oxidation sites and arise through enzymatic, but also spontaneous oxidation (figure b). the oxysterols were also selected with the aim of elucidating how the different oxidation pattern on the sterol core determines the selectivity of these important metabolites. furthermore, we also included -hydroxycholesterol ( -hc) as an oxysterol that has been more widely studied and applied, but whose complete target profile had also not been elucidated. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . target identification of oxysterols using thermal proteome profiling. a) workflow of the thermal proteome profiling experiments and criteria for the identification of putative oxysterol targets; b) structures of the tested oxysterols: ß-hydroxycholesterol ( ß-hc), cholestane- β, α, β-triol (ct), -hydroxycholesterol ( -hc) and - ketocholesterol ( -kc); c) summary table of the identified proteins from the hela proteome analysis and setting of the threshold limits for the identification of putative hits. tpp enabled the identification and monitoring of changes in thermal stability of up to proteins, upon the incubation of hela cell lysates with the different oxysterols (figure c). from these, it was possible to calculate thermal shifts for about % of the identified proteins. to define which shifts in melting temperatures were significant, two standard deviations from the median of all the calculated shifts was deemed appropriate, in line with previous reports.[ ] in the screening process, proteins with a significant change in melting temperature following oxysterol exposure were filtered according their melting curves normalized to the lowest temperature. proteins displaying a shift in the same direction (positive or negative Δtm) in all three replicates and with a curve plateau corresponding to a fraction of soluble protein less or equal to . were selected as potential targets (figure a and c). the entire screening set produced a list of hits considered as putative targets for at least one of the tested oxysterols (figure a). overall, the re-identification of known cholesterol binding proteins as determined by affinity-based probes[ ] (supporting information (si) table s ), both validates the use of tpp for identifying novel oxysterol target proteins, but also highlights the wealth of previously unidentified sterol interactors. interestingly, the overlap of the candidate targets between the different oxysterols was remarkably low. only -kc and β-hc shared two putative interacting proteins (figure b). while this result may appear unexpected, it is in fact (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . consistent with previously reported studies describing the binding of oxysterols to the cholesterol transport proteins niemann pick class (npc ) and aster-a (also known as gramd a). both were shown to bind -hc, but displayed no binding of a- or b-ring oxidized sterols.[ ][ ][ ] these examples and our data suggest that these particular oxysterols do not exert their function by modulating traditional (oxy)sterol-associated proteins. crucially, our data suggests that different oxysterols show distinct target profiles that are dependent on the position and level of oxidation. despite marked differences in their individual target profiles, some general trends could be observed clearly. the functional enrichment analysis of the identified candidate targets (figure a) showed an enrichment of the intracellular membrane compartments (highlighted in red and listed in si table s ). of these, several proteins associated with clathrin coated vesicle (ccv) transport were identified (figure c). ccv transport is known to require cholesterol,[ ] however except for osbp the mediators of this effect were unknown, and oxysterols were not suggested to modulate this process. trans-golgi network (tgn) membrane associated proteins were also significantly targeted. as ccvs are known to also form at the tgn, this could suggest an overall link between oxysterols and ccv transport. unexpectedly, a large proportion of the rna polymerase iii transcription complex was identified as putative oxysterol targets (figure c). in particular, constituents of the super elongation complex (sec) were highly enriched, with β-hc targeting several of the components (vide infra). perhaps unsurprisingly, among the reactome pathways enriched in the string analysis, the metabolism of lipids was identified (figure a, highlighted in blue, and si table s ). this was due to the presence of known sterol biosynthetic and metabolic proteins but also by a large number of lipid kinases of different classes. in particular phosphatidyl inositol kinases (piks), which were targeted by multiple oxysterols, contributed to the enrichment of phosphatidylinositol metabolism in both reactome and kegg pathways (si table s and figure c, respectively) and contributed to the enrichment of the phosphatidylinositol (phosphate) kinase activity as the most significant molecular function in the go analysis (figure e). proteins that regulated the mechanistic target of rapamycin complex (mtorc ) either directly or indirectly were also abundant, confirming its essential role in regulating lipid metabolism. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . analysis of all the putative targets for the tested oxysterols ß-hc, -hc, ct and -kc. a) string functional analysis with proteins from intracellular membrane-bounded organelle highlighted in red (go: ; fdr: . e- ), and proteins involved in the metabolism of lipids in blue (hsa- ; fdr: . ); b) venn diagram of the putative targets for each of the oxysterol and overlap among common targets; c) go cellular components enriched from the analysis of all the putative targets; d) kegg pathway analysis and target contribution for each pathway. pathways are colored according their significance from orange to white to indicate p-values from . to . ; e) go molecular functions enriched from the analysis of all the putative targets. proteome-wide profiling of -kc we focused our initial analysis of specific oxysterol target proteins with -kc, as it is the most prominent and toxic of the non-enzymatically produced oxysterols. several known and novel - kc targets with significant Δtm (figure a and b) were identified. for example, squalene monooxygenase (sqle) is a key cholesterol biosynthetic enzyme.[ ] -kc has been previously a d b c e (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . shown to lead to sqle degradation, analogously to cholesterol and other sterols oxidized at the -position. the identification of a known -kc target protein further confirms that tpp is a suitable approach for oxysterol target protein identification. importantly, destabilized proteins were considered as putative targets (si table s ). among them, brisc and brca -a complex member (bre), is known to be involved in the defective synthesis of steroid hormones and accumulation large quantities of cholesterol under stress or under the influence of steroid hormones.[ ],[ ] among other stabilized proteins, several are involved in pi metabolism, including pip k a, which is the main source of cellular pi , p . nuclear receptor-binding factor (nrbf ) is known to modulate pi k-iii activity by stabilization of the vps complex i, a key autophagy- related kinase.[ ] figure . tpp analysis of -kc. a) melting temperature shifts of the entire hela proteome. significant shifts lies outside the standard deviation interval marked with dotted lines. b) string functional analysis of the putative targets selected from the tpp screening assay. c) melting curves of squalene monooxygenase (sqle), e ubiquitin-protein ligase rnf (rnf ), vacuolar protein sorting-associated protein homolog (vps ) and the ragulator protein complex protein lamtor . data is mean ± sem of three independent experiments. the two targets most stabilized by -kc were associated with lysosomal functions (si table s ). e ubiquitin-protein ligase rnf and ragulator complex protein lamtor are both localized in lysosomes, where they perform ubiquitin protein ligase activity and regulation of tor signaling activity, respectively (see figure c for associated melting curves). rnf has been found to (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . regulate ampa receptor-mediated synaptic transmission[ ] while the lamtor complex regulates mtor signaling and thus cellular lipid metabolism more generally. the modulation of these targets may begin to explain the phenotypic effects elicited by -kc[ ]. accumulation of -kc in the lysosomes is thought to alter ph maintenance, reducing their ability to hydrolyze and process cellular debris, as it has already described for lysosomal accumulation of cholesterol[ ]. the presence of vps among the most stabilized proteins was intriguing, as it was one of only two (the other being aar splicing factor homolog) proteins identified as putative new targets for more than one oxysterol ( -kc and β-hc). vps is part of the golgi-associated retrograde protein (garp) complex, which is known to regulate cholesterol transport between early and late endosomes and the trans-golgi network (tgn) via lysosomal npc [ ]. however, a direct interaction of vps with cholesterol, or indeed any sterol, had not been reported. thus we selected vps for further validation. the tpp results were initially validated by means of a cellular thermal shift assay (cetsa), with western blot read-out. for both -kc and β-hc, we were able to reproduce the stabilization observed in the tpp experiment (figure a and b, respectively), although the thermal shift was less pronounced for β-ohc. to address this discrepancy, we carried out an isothermal dose-response fingerprinting (itdrf) experiment, which showed that β-hc stabilized vps in a dose-dependent manner at °c, confirming their putative interaction (figure c). figure . target validation of vps . a) cetsa experiments for the validation of vps with -kc; b) cetsa experiments for the validation of vps with ß-hc; c) itdrf experiment for the validation of vps in ß-hc with related dose-response curve. both reported isoforms of vps are visible. data is the mean of two independent experiments, representative blots are shown. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . putative targets of β-hydroxy cholesterol in addition to vps , β-hc appeared to target cholesterol transport in other ways (si table s and figure s ). targets were clearly enriched vacuolar transport processes, including vps , vps a and pik r (vps ) (figure s , green). vps is a component of the class iii pi k complex, which is a key component of autophagy initiation, strengthening the link observed with other oxysterols. vps a has been extensively associated with cholesterol transport, in a function not directly governed by its role in disassembling the endosomal sorting complex required for transport (escrt-iii) polymer.[ ][ ] the stabilization of translation initiation factors eif a and eif b, may be connected to the more general targeting of other mtor regulators including lamtor and by oxysterols, since it has been shown that the mtor complex mediates assembly of the translation preinitiation complex (pic) modulating the function of eif in the translation of mrnas encoding proteins.[ ] very recently β-hc has been shown to act as a pro-lipogenic factor by enhancing sterol regulatory element binding protein c (srebp c) expression in an lxr-dependent manner.[ ] in this context, we found that β-hc (de)stabilized a series of transcriptional regulators, including the general transcription factor c (gtf c) and two components of the super elongation complex, cyclin-dependent kinase (cdk ) and af /fmr family member (aff ). this raises the possibility that transcriptional elongation of srebp c may require β-hc’s ability to interact with the sec. putative targets of -hydroxy cholesterol putative targets of -hc were strongly enriched in pi metabolism (si figure s ). stabilization of the phosphatidylinositol -phosphate -kinase c domain-containing subunit alpha (pik c a) and destabilization of related subunit beta (pik c b), allowed the identification of two of the three isoforms of the class ii pi ks. these known to play key roles in clathrin-mediated endocytosis.[ ] the ability of -hc to modulate pik c a was tested in an enzymatic kinase profiling assay; however no change in kinase activity was observed (si table s ). this does not necessarily de- validate the target, as binding in an allosteric pocket may modulate protein-protein interactions rather than enzymatic activity. similarly, all other putative kinase targets of oxysterols (cdk , phkg and pip k a), were tested with the related assays without showing significant increase or decrease in enzymatic activity (si table s -s ). unsurprisingly, -hc also stabilized regulators of cholesterol biosynthesis and metabolism, including -dehydrocholesterol reductase (dhcr ). dhcr catalyzes the last step in the biosynthesis of cholesterol and when mutated has been associated with the developmental disease smith-lemli-opitz syndrome.[ ] binding to oxysterols such as -kc is known to induce its proteasomal degradation; however, interestingly this effect was not reported for -hc.[ ] the stabilization of host cell factor (hcfc ) by -hc could link the regulation of an intragenic region in the hcfc gene by sterol regulatory element-binding protein (srebp ),[ ] which is itself regulated by cholesterol and -hc.[ ] finally, -hc targeted the ragulator complex protein lamtor , and the transmembrane superfamily member (ensg ), a protein involved in authophagy,[ ] which is known to be regulated by cholesterol metabolism.[ ] (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . putative targets of cholestane triol ct targets were significantly enriched in autophagic and golgi-associated proteins (si figure s ). the most stabilized protein was paxillin (pxn), an autophagy substrate that interacts with lc during focal adhesion (fas) disassembly in highly metastatic tumor cells.[ ] fa turnover is reportedly influenced by orp -mediated lipid exchange,[ ] which may explain the association of oxysterols with pxn. a significantly destabilized target was the microtubule-associated protein s (map s), whose deficiency causes impaired autophagic degradation of lipid droplets, which then accumulate in normal renal epithelial cells, initiating the development of renal cell carcinomas.[ ] golgi phosphoprotein -like (golph l) was the most destabilized putative target. interestingly, this protein is also associated to the akt/mtor pathway, since it contributes to the tumorigenesis of hepatocellular carcinoma increasing cell proliferation by the activation of mtor signaling via overexpression of mtorc .[ ] adaptin ear-binding coat-associated protein (necap ) promotes fast endocytic recycling of epidermal growth factor receptor (egfr) and of the tumor necrosis factor receptor (tfnr) through the recruitment of ap- –clathrin machinery to early endosomes. in order to facilitate the receptor recycling, early endosomes receive endocytosed material from clathrin-dependent and -independent pathways and sort cargo for recycling to the cell surface, retrograde transport to the golgi or degradation in lysosomes.[ ] necap sits at a node in the overall oxysterol target interaction map, and would thus be an intriguing target for further study. conclusion in summary, we have carried out the first systematic exploration of oxysterol target proteins using thermal protein profiling as the enabling technology. tpp proved convenient for screening small compounds sets such as the four oxysterols we selected, as it does not require compound modification or functionalization. furthermore, previously identified sterol-binding proteins were re-identified here, validating the approach.[ ] strikingly, our results demonstrate that oxysterols which differ from cholesterol by the addition of just one or two oxygen atoms, display distinct target profiles, with only two proteins identified as targets of more than one oxysterol. to the best of our knowledge this has never been conclusively shown or systematically studied. although virtually no overlap between the oxysterol targets was present, targets were enriched in lipid metabolism, mtor signaling, vesicle trafficking and transcriptional regulators. the intracellular membrane localization of most target proteins is also consistent with the lipophilic nature of the compounds, and their reported membrane association. of the two proteins which share two oxysterols as putative targets, vps was further validated using cetsa and itdrf experiments. although its role in mediating cholesterol transport by targeting npc to the lysosomes as part of the garp complex is known, our data raises the intriguing possibility that this event is regulated by (oxy)sterols themselves. the specific target profiles of the individual oxysterols studied may also begin to explain the phenotypes they induce. in particular -kc has previously been shown to affect lysosomal integrity and activity. the fact that several of the putative targets identified are lysosomal membrane proteins may begin to offer an explanation for this observed effect. importantly, future work to determine whether target (de)stabilization by oxysterols occurs through direct binding or is mediated by a complex will be necessary. to conclude, tpp is a robust technology to identify new oxysterol target proteins, and the data provided herein provides an extensive resource as well as a wealth of testable hypotheses linking (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . oxysterols to lipid metabolism and transport, vesicle trafficking and transcription. despite the exciting developments achievable with this technique, it is important to note that like all target identification methods, tpp also has its caveats. false negatives are more common with this technique as target proteins may not be (de)stabilized by small molecules they interact with, or that very high compound concentrations are required to see observe a meaningful effect. this was particularly apparent for known -hc protein targets including ospb, npc and certain stards which were not identified as putative targets although they are present in our ms data and more generally in meltome analyses. [ ] osbp, stard , npc and npc proteins were identified in the hela cell proteome, but their thermal shift was not considered significant according the chosen criteria or was not determined in all three replicates. in this regard, the arbitrary exclusion of proteins with shifts lower than two standard deviations from the median might particularly affect the recognition of protein targets belonging to compounds whose meltome more generally altered from the dmso control. while the use of np- facilitates recovery of membrane proteins, it has recently been shown that different detergent types and concentrations can affect which proteins are recovered in the final analysis, introducing a slight bias.[ ] despite this, we believe that tpp and its variants including itdrf will be applied increasingly for (off)- target identification and validation. acknowledgements we would like to thank assoc. prof. erwin schoof from dtu proteomics core for excellent advice and support and prof. ulrich auf dem keller for access to cell culture and reagents at dtu bioengineering. we would also like to thank dr. petra janning and malte metz for invaluable advice regarding the data analysis. we would also like to acknowledge the novo nordisk foundation (nnf oc ) and dtu for funding. references [ ] x. ding, w. zhang, s. li, h. yang, am. j. cancer res. , , – . [ ] s. macmahon, s. duffy, a. rodgers, s. tominaga, l. chambless, g. de backer, d. de bacquer, m. kornitzer, p. whincup, s. g. wannamethee, r. morris, n. wald, j. morris, m. law, m. knuiman, h. bartholomew, g. davey smith, p. sweetnam, p. elwood, j. yarnell, r. kronmal, d. kromhout, s. sutherland, j. keil, g. jensen, p. schnohr, c. hames, a. tyroler, a. aromaa, p. knekt, a. reunanen, j. tuomilehto, p. jousilahti, e. vartiainen, p. puska, t. kuznetsova, t. richart, j. staessen, l. thijs, t. jorgensen, t. thomsen, d. sharp, j. d. curb, n. qizilbash, h. iso, s. sato, a. kitamura, y. naito, a. benetos, l. guize, u. goldbourt, m. tomita, y. nishimoto, t. murayama, m. criqui, c. davis, c. hart, d. hole, c. gillis, d. jacobs, h. blackburn, r. luepker, j. neaton, l. eberly, c. cox, d. levy, r. d’agostino, h. silbershatz, a. tverdal, r. selmer, t. meade, k. garrow, j. cooper, f. speizer, m. stampfer, a. menotti, a. spagnolo, i. tsuji, y. imai, t. ohkubo, s. hisamichi, l. haheim, i. holme, i. hjermann, p. leren, p. ducimetiere, j. empana, k. jamrozik, r. broadhurst, g. assmann, h. schulte, c. bengtsson, c. björkelund, l. lissner, p. sorlie, m. garcia- palmieri, e. barrett-connor, r. langer, k. nakachi, k. imai, x. fang, s. li, r. buzina, a. nissinen, c. aravanis, a. dontas, a. kafatos, h. adachi, h. toshima, t. imaizumi, s. nedeljkovic, m. ostojic, z. chen, h. tunstall-pedoe, t. nakayama, n. yoshiike, t. yokoyama, c. date, h. tanaka, j. keller, k. bonaa, e. arnesen, e. rimm, m. gaziano, j. e. buring, c. hennekens, s. törnberg, j. carstensen, m. shipley, d. leon, m. marmot, j. armitage, c. baigent, r. clarke, r. collins, j. emberson, j. halsey, m. landray, s. lewington, a. palmer, s. parish, r. peto, p. sherliker, g. whitlock, lancet , , – . [ ] j. e. vance, dis. model. mech. , , – . [ ] v. mutemberezi, o. guillemot-legris, g. g. muccioli, prog. lipid res. , , – . [ ] s. gill, j. stevenson, i. kristiana, a. j. brown, cell metab. , , – . [ ] a. a. bielska, p. schlesinger, d. f. covey, d. s. ory, trends endocrinol. metab. , , – . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . [ ] a. kloudova, f. p. guengerich, p. soucek, trends endocrinol. metab. , , – . [ ] a. j. brown, w. jessup, atherosclerosis , , – . [ ] q. zhou, e. wasowicz, b. handler, l. fleischer, f. a. kummerow, atherosclerosis , , – . [ ] a. anderson, a. campo, e. fulton, a. corwin, w. g. jerome rd, m. s. o’connor, redox biol. , , . [ ] l. dai, n. prabhu, l. y. yu, s. bacanu, a. d. ramos, p. nordlund, annu. rev. biochem. , , – . [ ] t. friman, bioorg. med. chem. , , . [ ] a. kawatkar, m. schefter, n.-o. hermansson, a. snijder, n. dekker, d. g. brown, t. lundbäck, a. x. zhang, m. p. castaldi, acs chem. biol. , , – . [ ] l. dai, t. zhao, x. bisteau, w. sun, n. prabhu, y. t. lim, r. m. sobota, p. kaldis, p. nordlund, cell , , - .e . [ ] i. becher, a. andrés-pons, n. romanov, f. stein, m. schramm, f. baudin, d. helm, n. kurzawa, a. mateus, m.-t. mackmull, a. typas, c. w. müller, p. bork, m. beck, m. m. savitski, cell , , - .e . [ ] k. a. ball, k. j. webb, s. j. coleman, k. a. cozzolino, j. jacobsen, k. r. jones, m. h. b. stowell, w. m. old, commun. biol. , , . [ ] s. a. peck justice, m. p. barron, g. d. qi, h. r. s. wijeratne, j. f. victorino, e. r. simpson, j. z. vilseck, a. b. wijeratne, a. l. mosley, j. biol. chem. , jbc.ra . . [ ] j. j. hulce, a. b. cognetta, m. j. niphakis, s. e. tully, b. f. cravatt, nat. methods , , – . [ ] r. e. infante, l. abi-mosleh, a. radhakrishnan, j. d. dale, m. s. brown, j. l. goldstein, j. biol. chem. , , – . [ ] r. e. infante, a. radhakrishnan, l. abi-mosleh, l. n. kinch, m. l. wang, n. v grishin, j. l. goldstein, m. s. brown, j. biol. chem. , , — . [ ] l. laraia, a. friese, d. p. corkery, g. konstantinidis, n. erwin, w. hofer, h. karatas, l. klewer, a. brockmeyer, m. metz, b. schölermann, m. dwivedi, l. li, p. rios-munoz, m. köhn, r. winter, i. r. vetter, s. ziegler, p. janning, y.-w. wu, h. waldmann, nat. chem. biol. , , – . [ ] s. k. rodal, g. skretting, Ø. garred, f. vilhardt, b. van deurs, k. sandvig, mol. biol. cell , , – . [ ] j. miao, n. s. panesar, k.-t. chan, f. m. m. lai, n. xia, y. wang, p. j. johnson, j. y. h. chan, j. histochem. cytochem. , , – . [ ] j. miao, k. w. chan, g. g. chen, s. y. chun, n. s. xia, j. y. h. chan, n. s. panesar, j. endocrinol. , , – . [ ] j. lu, l. he, c. behrends, m. araki, k. araki, q. jun wang, j. m. catanzaro, s. l. friedman, w.-x. zong, m. i. fiel, m. li, z. yue, nat. commun. , , . [ ] m. p. lussier, b. e. herring, y. nasu-nishimura, a. neutzner, m. karbowski, r. j. youle, r. a. nicoll, k. w. roche, proc. natl. acad. sci. u. s. a. , , – . [ ] w. g. jerome, b. e. cox, e. e. griffin, j. c. ullery, microsc. microanal. , , – . [ ] j. wei, y.-y. zhang, j. luo, j.-q. wang, y.-x. zhou, h.-h. miao, x.-j. shi, y.-x. qu, j. xu, b.-l. li, b.-l. song, cell rep. , , – . [ ] x. du, a. s. kazim, i. w. dawes, a. j. brown, h. yang, traffic , , – . [ ] n. bishop, p. woodman, mol. biol. cell , , – . [ ] r. marchione, s. a. leibovitch, j.-l. lenormand, cell. mol. life sci. , , – . [ ] o. moldavski, p.-j. h. zushin, c. a. berdan, r. j. van eijkeren, x. jiang, m. qian, d. s. ory, d. f. covey, d. k. nomura, a. stahl, e. j. weiss, r. zoncu, biorxiv , . . . . [ ] y. posor, m. eichhorn-gruenig, d. puchkov, j. schöneberg, a. ullrich, a. lampe, r. müller, s. zarbakhsh, f. gulluni, e. hirsch, m. krauss, c. schultz, j. schmoranzer, f. noé, v. haucke, nature , , – . [ ] b. u. fitzky, m. witsch-baumgartner, m. erdel, j. n. lee, y.-k. paik, h. glossmann, g. utermann, f. f. moebius, proc. natl. acad. sci. u. s. a. , , – . [ ] a. v prabhu, w. luu, l. j. sharpe, a. j. brown, j. biol. chem. , , – . [ ] m. motallebipour, s. enroth, t. punga, a. ameur, c. koch, i. dunham, j. komorowski, j. ericsson, c. wadelius, febs j. , , – . [ ] c. m. adams, j. reitz, j. k. de brabander, j. d. feramisco, l. li, m. s. brown, j. l. goldstein, j. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . biol. chem. , , – . [ ] p. he, z. peng, y. luo, l. wang, p. yu, w. deng, y. an, t. shi, d. ma, autophagy , , – . [ ] e. piscianz, l. vecchi brumatti, a. tommasini, a. marcuzzi, neural regen. res. , , – . [ ] m. n. sharifi, e. e. mowers, l. e. drake, c. collier, h. chen, m. zamora, s. mui, k. f. macleod, cell rep. , , – . [ ] r. s. d’souza, j. y. lim, a. turgut, k. servage, j. zhang, k. orth, n. g. sosale, m. j. lazzara, j. allegood, j. e. casanova, elife , , doi . /elife. . [ ] g. xu, y. jiang, y. xiao, x. d. liu, f. yue, w. li, x. li, y. he, x. jiang, h. huang, q. chen, e. jonasch, l. liu, oncotarget , , – . [ ] h. liu, x. wang, b. feng, l. tang, w. li, x. zheng, y. liu, y. peng, g. zheng, q. he, bmc cancer , , . [ ] j. p. chamberland, l. t. antonow, m. dias santos, b. ritter, j. cell sci. , , lp – . [ ] a. jarzab, n. kurzawa, t. hopf, m. moerch, j. zecha, n. leijten, y. bian, e. musiol, m. maschberger, g. stoehr, i. becher, c. daly, p. samaras, j. mergner, b. spanier, a. angelov, t. werner, m. bantscheff, m. wilhelm, m. klingenspor, s. lemeer, w. liebl, h. hahne, m. m. savitski, b. kuster, nat. methods , , – . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . molecular dynamics simulations and functional studies reveal that hbd- binds sars-cov- spike rbd and blocks viral entry into ace expressing cells molecular dynamics simulations and functional studies reveal that hbd- binds sars-cov- spike rbd and blocks viral entry into ace expressing cells liqun zhang , , santosh k. ghosh , , shrikanth c. basavarajappa , , jeannine muller-greven , jackson penfield , ann brewer , parameswaran ramakrishnan , , matthias buck , and aaron weinberg , , chemical engineering, tennessee technological university, cookeville, tn biological sciences, school of dental medicine, case western reserve university, cleveland, oh department of pathology, school of medicine, case western reserve university, cleveland, oh department of physiology and biophysics, school of medicine, case western reserve university, cleveland, oh contributed equally lead contact correspondence: pxr @case.edu (pr); mxb @case.edu (mb); axw @case.edu (aw) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract: new approaches to complement vaccination are needed to combat the spread of sars-cov- and stop covid- related deaths and long-term medical complications. human beta defensin (hbd- ) is a naturally occurring epithelial cell derived host defense peptide that has antiviral properties. our comprehensive in-silico studies demonstrate that hbd- binds the site on the cov- -rbd that docks with the ace receptor. biophysical and biochemical assays confirm that hbd- indeed binds to the cov- - receptor binding domain (rbd) (kd ~ nm), preventing it from binding to ace expressing cells. importantly, hbd- shows specificity by blocking cov- /spike pseudoviral infection, but not vsv-g mediated infection, of ace expressing human cells with an ic of . + . µm. these promising findings offer opportunities to develop hbd- and/or its derivatives and mimetics to safely and effectively use as novel agents to prevent sars-cov- infection. key words: human beta defensin- (hbd- ), ace receptor, receptor binding domain (rbd), sars- cov- , covid- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction the ongoing covid- pandemic, the result of infection by sars-coronavirus- (cov- ), continues to infect people worldwide; having claimed over . million lives (johns’ hopkins university) as of late december . while the first vaccines are now being administered, albeit initially to a select population, the virus continues to evolve in significant ways. this situation requires the discovery of novel therapeutic approaches, possibly to be used independently or in conjunction with existing approved regimens, to impede the virus’ relentless spread. all coronaviruses, including cov- , express the all-important s (spike) protein that gives these viruses the characteristic corona or crown appearance (siu et al., ; yoshimoto, ). the s protein is responsible for binding to the host cell receptor followed by fusion of the viral and cellular membranes, (walls et al., ). to engage a host cell receptor, the receptor-binding domain (rbd) of the s protein undergoes hinge-like conformational movements that transiently hide or expose its determinants for receptor binding (wrapp et al., ). structural fluctuations of the rbd, relative to the entire s protein, enable exposure of the receptor-binding motif (rbm), which mediates interaction with the receptor angiotensin-converting enzyme (ace ) on the host cell (lan et al., ; mccallum et al., ; walls et al., ; yan et al., ). since this is believed to be the critical initial event in the infection cascade, the rbd has been proposed as a potential target for therapeutic strategies (tai et al., ). the high degree of dynamics of the rbd:ace complex (brielle et al., ; ghorbani et al., ; spinello et al., ; xiong et al., ), suggests that binding of small flexible proteins and peptides may inhibit spike protein:host cell receptor interactions, which can be interrogated by computational modeling and simulations most suitable for exploring these interactions (amaro and mulholland, ). nature’s own antimicrobial peptides (amps) have been proposed as multifunctional defenses that participate in the elimination of pathogenic microorganisms, including bacteria, fungi, and viruses (diamond et al., ). exhibiting antimicrobial and immunomodulatory properties, amps have been intensively studied as alternatives and/or adjuncts to antibiotics in bacterial infections and have also gained substantial attention as anti-viral agents (mulder et al., ). human beta defensins (hbds), the major amp group expressed naturally in mucosal epithelium, provide a first-line of defense against various infectious pathogens, including enveloped viruses (leikina et al., ; quiñones-mateu et al., ; ryan et al., ). the hbds are cationic peptides, which assume small β-sheet structures varying in length from to amino acid residues and which are primarily expressed by epithelial cells (bensch et al., ; harder et al., ; harder et al., ; schibli et al., ). hbd- has been shown to express throughout the respiratory epithelium from the oral cavity to the lungs and, it is believed that this defensin plays a very important role in defense against respiratory infections (diamond et al., ). altered hbd- expression in the respiratory epithelium is known to be associated with the pathogenesis of several respiratory diseases such as asthma, pulmonary fibrosis, pneumonia, tuberculosis and rhinitis, (diamond et al., ; doss et al., ; ooi et al., ; rivas-santiago et al., ; semple and dorin, ). hbd- has been demonstrated to inhibit human respiratory syncytial virus (rsv) infection by blocking viral entry through destabilization/disintegration of the viral envelope (kota et al., ). it might also have important immunomodulatory roles during coronavirus infection as well, as hbd- conjugated to the mers receptor binding domain (rbd) has been reported in a mouse model to promote better protective antibodies to rbd than rbd alone (kim et al., ). in the present study, we examined the ability of hbd- to act as a blocking agent against cov- . hbd- is an amphipathic, beta-sheeted, highly cationic (+ charge) molecule of amino acids, and is stabilized by three intramolecular disulfide bonds that protects it from degradation by proteases (sawai et al., ). the protein has been studied before with molecular dynamics simulations, (yeasmin et al., ) (barros et al., ; ghorbani et al., ; spinello et al., ). through extensive in silico docking .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and molecular dynamic simulation analyses we report herein that hbd- binds to the receptor binding motif (rbm) of the rbd of cov- that associates with the ace receptor. biophysical and biochemical studies confirmed that hbd- binds the rbd and also prevents it from binding ace . moreover, by utilizing a physiologically relevant platform, we revealed that hbd- effectively blocks cov- spike expressing pseudovirions from entering ace expressing human cells. harnessing the utility of naturally occurring amps, such as hbd- , and their derived smaller peptides, could be a viable approach at developing novel cov- therapeutics. results: interrogating the interaction of sars-cov- rbd with ace and hbd- using in silico docking and molecular dynamics simulations rbd:ace complex: we began our in silico work by running, as a reference, a ns all-atom molecular dynamics (md) simulation of the ace :rbd complex. the final structure was compared with the initial experimental crystal structure (lan et al., ), as shown in figure s a. only small deviations are seen in some of the loop regions and at the n- and c-termini of both proteins; the overall rms deviation (rmsd), of the structure, calculated for backbone ca atoms, is around . Å, for ace , around . Å for the rbd and around . Å for the complex (figure s b) [supplementary information]. the result of calculating the rms fluctuation (rmsf) for the ca atom of each residue in the rbd and in ace is shown in figure a (left and right). overall, the main-chain fluctuations in the rbd and ace are small with a magnitude of around . Å for the most structured, α-helical and β-sheet parts. as can be seen, the loop regions are more flexible, having a higher rmsf of up to Å. the difference in fluctuations between ace and rbd in their bound and free states in solvent are shown in figure b. as is usually expected, most regions at the rbd:ace interaction interface become less flexible (shaded in blue), while other changes, including increases in fluctuations (shaded in red) are seen further away from the interface, consistent with the recent description of allostery in the spike protein (gross et al., ; ray et al., ). upon complex formation, the rbd and ace proteins form intermolecular hydrogen bonds, which is one of the driving forces for their binding. these bonds, calculated over the course of the ns simulation, are plotted in figure c. in the first ns, the average number of hydrogen bonds fluctuates between - , but settles at a slightly lower number, - , at the end. importantly, these bonds are highly dynamic with occupancy between - %. hydrogen bonds with good persistency are listed in the table in figure . ace residues lys and gly , tyr and asn , asp and lys formed hydrogen bonds with duration of at least %. in total, of h-bonds of the rbd:ace interface in the crystal structure (lan et al., ) are populated with reasonable occupancy in the simulations. similar behavior has been seen in other simulations (ghorbani et al., ; spinello et al., ) with the difference likely explained by solution vs. crystallization conditions. water molecules were observed at the interface in other simulations and are likely bridging the interactions (malik et al., ), also underscoring the dynamic nature of the interactions (see below). to further indicate the overall stability of the interface in the simulations, we calculated the solvent accessible surface area, which is buried between the rbd and ace proteins in the complex. during ns, this buried surface area fluctuates between a minimum of Å to a maximum of Å , but this is maintained at an average of ~ Å over the last ns of the trajectory. we also calculated the distance map between ace and rbd atoms, which are closer than Å on average as a reference (see below). rbd:hbd- (monomer): in order to explore the initial possible bound structures between the two proteins, we carried out docking with cluspro and haddock (see methods in supplement). the best predicted models were used as starting structures for all-atom md, as above; however, since the initial docked structures are not well converged, we carried out the simulations for up to ns. we also ran repeat simulations with different starting seeds (initial velocity assignments). the simulations performed are summarized in table s [supplementary information]. in figure a we present the most converged and apparently stable trajectory, showing the initial structure when compared to the last structure (after .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ns). slight rotation of hbd- relative to the initial structure is indicated at ns by the transition of rmsd when plotted as a function of simulation time (figure b); however, for the remaining ns, hbd- stayed in the same position. analysis of the other three trajectories is provided in figure s . the comparison of main-chain fluctuations in rbd and hbd- , between their bound and free states is shown in figure a. overall, the binding region becomes less flexible on the rbd in similar key loop regions whose dynamics are dampened by ace binding, while on the side of hbd- a significant number of main-chain sites also see their fluctuations decreased. the results are mapped to the final structure of the trajectory in figure b. as above, we calculated the number of intermolecular hydrogen bonds formed between hbd- and rbd over the course of the trajectory (figure c). they are fewer, with an average ± , compared to those bridging the rbd:ace complex. similarly, with the exception of the hbd- residue arg , which forms a hydrogen bond with the ace residue glu greater than % of the time, the occupancy of other hydrogen bonds is reduced compared to the reference complex. as before, the occupancy of these interactions is not %; i.e., more like %, suggesting that they are somewhat dynamic (see discussion below) and are accompanied by indirect h-bond interactions with water molecules near or at the interface bridging the interactions (malik et al., ). both of these features were also found in simulations of the rbd:ace interaction, as already noted; however, the dynamics of these interactions appear to be more prevalent in the rbd:hbd- interaction. as might be expected for the cationic hbd- , the positively charged sidechains are a prominent feature in the interactions, especially arg and arg . the rbd residues most persistently involved in the interaction with hbd- are shown in the table of figure . with the exception of gln , the interaction between hbd- and rbd involves amino acids that are within a few residues of those that are involved between ace and rbd and cover a good proportion of the same interface area. the persistency of the complex is also confirmed in the changes in accessible surface area, which is buried between the two proteins, and fluctuates moderately around a value of ± Å . the value is smaller than that of ace ( Å ), indicating that less area is covered. this is expected since the hbd- protein is considerably smaller than the rbd. a distance map, comparing residues which are on average closer than Å in the rbd:ace and rbd:hbd- complexes is shown in figure . for the rbd:ace interaction (figure a), residues to , to , as well as a short stretch of residues around , and on ace bind with the rbd, whose binding interface ranges from residue to . some of the rbd residues are in loop regions; e.g., and , which also come close to ace over the course of the ns simulation. the contact analysis for the rbd:hbd- complex over the course of the ns simulation is shown in figure b. remarkably, in comparison with the rbd:ace complex, essentially all residues of the rbd which contact ace , either the same ones or their close neighbors, are also in contact with hbd- . however, there are some subtle shifts. for example, rbd residues - make contact with ace but not with hbd- , where these interactions may have shifted to residue . also, a regional area of residues - contacts hbd- , which is not seen with ace . these contacts may be absent in the rbd:ace complex because it is less dynamic, and only sampled for ns. alternatively, they may provide a mechanistic entry for hbd- in replacing/competing away ace from the spike trimer. in order to confirm consistent binding of hbd- to rbd, we started simulations from the same initial structure of figure and repeated the simulation for three more times, each with a different random seeds (simulation details are shown in table s , and results are shown in figure s ). these simulations are consistent with the results above in terms of the rmsd and the average surface area buried. the change in fluctuations in forming the complex varied. the average numbers of hydrogen bonds, around ± at any one time, are slightly less; however, as stated above, arg and arg are the major residues on hbd- contributing to the formation of hydrogen bonds. rbd:hbd- (dimer): although the affinity of hbd- for dimerization is modest (hoover et al., ), it is possible that binding to the rbd stabilizes the dimeric form. we, therefore, also docked the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hbd- dimer to the rbd and carried out simulations. the initial and final structure comparison of one of the simulations is shown in figure s a and the rmsd is plotted as a function of simulation time (figure s b). the comparison of the rmsf of the hbd- dimer and rbd in their bound state with their fluctuations in their free state is shown as well (figure s ). intriguingly, when compared to hbd- monomer binding, a few regions of the rbd do not diminish as much in flexibility, while some actually become more flexible. the buried surface accessible area is slightly larger (about % larger) for the dimer compared to monomer binding, confirming that interactions to both units of the dimer from the rbd exist. the distance map is given in figure s . as shown, both units of the hbd- dimer can bind with the rbd in the residue range of to . mostly dimer associated residues from to are in tight and close contact with the rbd. the number of hydrogen bonds formed between the hbd- dimer and the rbd are similar to those formed by the monomer and the rbd (figure c and figure s a). again the hbd- arg is the most prominent interacting residue. in fact, unit of the dimer can form more hydrogen bonds with the rbd, and also one hydrogen bond from unit is prominent, again involving its arg , this time to glu on rbd, which is outside the region typically interacting with ace . remarkably, the persistency of the hydrogen bonds is increased from ~ % in the monomer to ~ % in the dimer (shown in figure s ), suggesting overall that binding of a dimeric hbd- may be favorable. rbd:hbd- -interaction energy calculation due to the caveats associated with calculations of free energy estimations from trajectories such as the ones run for this study, we carried out the binding interaction energy calculation for rbd binding with ace and hbd- monomer/dimer, respectively, using the popular gbsa method (see materials & methods section). we report the average energies and standard deviations as a histogram in figure s . these interaction energies have similar values and all are slightly negative. comparing the binding energy between rbd with ace and with hbd- monomer/dimer, the average binding energy of the rbd with ace is - ± kcal/mol whereas average binding energy of rbd with hbd- dimer is - ± kcal/mol, and similarly for rbd binding with hbd- monomer. however, it is likely that the entropy change upon binding rbd is significantly more favorable for binding to hbd- than binding to ace since the former is more dynamic in the bound state, giving less of an entropy penalty upon binding. in fact this latter indication suggests that peptides, which are initially unstructured in the unbound state could also maintain considerable flexibility in the bound state and may thus be powerful antagonists of the rbd:ace interaction. detailed thermodynamics analyses, both experimental and computational are needed to clarify this point. irrespective of these estimated numerical values, the calculations suggest that hbd- at a sufficiently high concentration should be able to block the binding of rbd with ace . our experimental analysis with rbd:hbd- interactions using purified proteins and the spike-pseudovirion assay suggests such a concentration is likely to be in the vicinity of the ic of . µm. experimental studies confirming the binding of hbd- with the rbd we used multiple experimental approaches to confirm the in silico findings of hbd- and sars-cov- rbd binding. microscale thermophoresis (mst) showed that cov- rbd interacts with recombinant hbd- (rhbd- ) with a dissociation constant of ~ nm (figure a). this interaction is weaker (> μm) when hbd- loses its natural conformation under disulfide bond reducing conditions (figure a). we then followed up using a functional elisa assay, and found that rhbd- bound to immobilized rbd in a linear range (over concentrations of . to nm), as detected by biotinylated anti-hbd- detection antibodies (figure b). we then examined the binding of rhbd and recombinant histidine tagged-rbd (his-rbd) derived from our expression system for codon optimized cov- rbd (see materials and methods) by co- immunoprecipitation. by incubating rhbd- with his-rbd at a ratio of . : . , followed by nickel bead immunoprecipitation of his-rbd and probing for hbd- in western blots, we found significant binding of hbd- to his-rbd (figure c). control western blots showed only modest background binding of hbd- in the absence of rbd, thereby confirming the specificity of the rbd:hbd- interaction (figure c). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hbd- blocks the binding of rbd with cellular ace next, we examined whether rhbd- can interfere with the binding of rbd to the host ace receptor. we utilized hek t cells that overexpress the human ace receptor in the assays and incubated these cells with flag-rbd containing culture supernatant with and without rhbd- . we immunoprecipitated rbd through the flag tag and examined the co-precipitation of ace . we found that flag-rbd effectively precipitated ace and the addition of hbd- competitively decreased rbd-ace binding (figure d). rbd levels were also decreased in the immunoprecipitate upon rhbd- addition, further suggesting a direct interaction of rbd with rhbd- , thereby preventing rbd-ace binding (figure d). hbd- specifically inhibits sars-cov- spike-mediated pseudoviral infection after discovering that rhbd- binds rbd and competitively inhibits rbd binding to ace , we investigated whether rhbd- can inhibit spike mediated pseudoviral entry into ace expressing cells. a luciferase reporter expressing cov- spike-dependent lentiviral system (crawford et al., ) was used to study the competitive inhibitory effects of rhbd- on cov- spike-mediated infection. we infected ace expressing hek t cells using the pseudotyped virus and found substantial luciferase activity in a viral dose dependent manner (figure a). next, we studied the effect of rhbd- on spike-dependent viral infection of ace /hek t cells by luciferase activity and found that hbd- decreased the spike mediated pseudoviral infection (figure b and c). to further validate that the inhibitory effect of hbd- is specific to a spike-mediated infection, we used a virus pseudotyped with vesicular stomatitis virus glycoprotein (vsvg) as an independent control. viruses pseudotyped with vsvg are pantropic; i.e., they can infect all cell types (lever et al., ), and do not depend on ace for entry. we obtained significant infection of ace /hek t cells using vsvg pseudotyped virus without or with the addition of rhbd- (figure d and e), thereby demonstrating the specificity of hbd- in blocking cov- spike glycoprotein mediated infection of ace expressing cells. we then inquired if increased inhibition of spike mediated pseudoviral entry was directly proportional to increased concentration of hbd- . we discovered that indeed there was a clear hbd- dose response inhibition of pseudoviral entry (figure f and g), and that the inhibitory concentration (ic ) was approximately . ± . µm (figure h). at a concentration of µg/ml rhbd- decreased the spike- mediated pseudoviral infection by over % (figure h). discussion the human body expresses over a hundred amps that are found in either intracellular granules of professional phagocytes and/or in epithelial cells of mucosa lining our external and internal surfaces (dawgul et al., ). beta defensins and ll- , the only member of the cathelin amps expressed in humans, are localized to the mucosa of the oral cavity, nares and upper airway (diamond and ryan, ; ghosh et al., ; khurshid et al., ; lee et al., ; mathews et al., ; singh et al., ); i.e., sites deemed vulnerable to cov- entry and initial infection. indeed, these two types of amps, part of the epithelial cell’s arsenal of innate responses used to defend against viral challenges at mucosal sites, have been shown to interrupt viral infection of various viruses, including coronaviruses (kim et al., ). however, when a mucosal site becomes overwhelmed by a microbial threat, replenishing the amp armamentarium locally after initial release; i.e., time from transcriptional activation, translation, post- translational modification to rerelease, takes multiple hours and makes bystander cells more vulnerable to viral infection. moreover, if a microbial threat can inhibit production or release of these amps, it renders this innate defense useless. to overcome this, the amps or their mimetics, if administered exogenously in high enough concentrations, could be a sound therapeutic strategy to protect the host at vulnerable mucosal sites without eliciting an unwanted immunological response against the agent. interestingly, these same amps have been shown to be released by human mesenchymal stem cells (hmscs) (krasnodembskaya et al., ; sutton et al., ), recently repurposed to treat covid- patients. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / while hmscs have been shown to contribute to the recovery of severely ill cov- infected patients (moll et al., ; tsuchiya et al., ), the role that amps play and the mechanism by which hmscs ameliorate symptoms of covid- remains to be determined. however, the modulation of severe inflammation and microbicidal activity related to pulmonary disease are outcomes attributable to these amps (alcayaga-miranda et al., ; chow et al., ; krasnodembskaya et al., ; sutton et al., ) . we chose to interrogate hbd- for its ability to block cov- from infecting vulnerable cells because of its innate role in protecting the oral cavity and the upper airway, and because its mouse ortholog has been shown to inhibit other coronaviruses (zhao et al., ). the computer simulations that we ran of hbd- and the rbd showed remarkable stability of the complex even after ns. there was also a clear overlap of binding sites when compared to the rbd:ace complex, as verified by analysis of protein- protein residue contact distance maps. multiple methods involving mst, elisa and immunoprecipitation followed by western blotting independently verified that hbd- binds to the rbd, thereby validating our in silico data. competitive inhibition assays were able to show that hbd- reduced rbd:ace binding by removing rbd from solution, which would otherwise be available for binding ace . finally, by incorporating a luciferase reporter expressing cov- spike-dependent lentiviral system (lever et al., ), we demonstrated that hbd- inhibited viral entry into ace expressing hek t cells in a dose dependent manner, with an ic of ~ . µm. this concentration is much less than most other inhibitory concentrations attributed to hbd- antimicrobial activity (joly et al., ) and points to a favorable affinity, and possibly also avidity, of the interaction between hbd- and the rbd. interestingly, hbd- begins to show hemolytic activity at a concentration times greater ( µm) than our ic (koeninger et al., ), and shows no signs of cytotoxic effects against various other human cells (warnke et al., ) at over twice our ic (herrera et al., ; mi et al., ; sakamoto et al., ). this suggests a favorable therapeutic window for hbd- before unacceptable toxicity becomes an issue. clearly, next steps in conclusively showing the efficacy of hbd- against cov- would be to conduct live viral in vitro infections of ace expressing cells in a bsl facility followed by in vivo cov- infection studies in appropriate animal models (kim et al., ). in vivo application of hbd- has proven successful in addressing a number of diseases. this includes a recent study demonstrating efficacy in experimental colitis in a mouse model (koeninger et al., ) and therapeutic intranasal application of hbd- to reduce the influx of inflammatory cells into bronchoalveolar lavage fluid (pinkerton et al., ). of relevance to our study is the use of smaller hbd fragments; i.e., mimetics, of mouse beta defensin (mbd- ) (zhao et al., ), the ortholog of hbd- , that when administered intra-nasally, rescued % of mice from the lethal challenge of human and avian influenza a, sars-cov and mers-cov (lemessurier et al., ). therefore, should in vivo studies of hbd- prove to be successful in blocking live cov- infection in an animal model, the fact that the peptide is endogenous to humans and would not elicit an immunogenic response, give it a high probability of being safe and a quicker route to human clinical trials. in fact, several amps, as well as amp mimetics are currently undergoing clinical trials for multiple different diseases (mookherjee et al., ). a recent in silico molecular docking study predicted a strong binding interaction between ll- and the rbd, demonstrating the blocking potential of ll- for ace binding (lokhande, ). this was followed up by a surface plasmon resonance study confirming the simulation results (roth et al., ). since ll- has also been shown to possess antiviral activity (tripathi et al., ), these results support the idea that more than one amp could be utilized, possibly in a “cocktail” to act as a potent viral blocking agent. recent findings also highlight that neuropilin- (nrp ), a receptor involved in multiple physiological processes and expressed on many cell types (roy et al., ), is being utilized by cov- to facilitate entry and infection (cantuti-castelvetri et al., ; daly et al., ). time will tell if blocking ace alone will be enough to reduce cov- infection and/or reduce the severity of symptoms, or if an additional strategy of also blocking entry via nrp will be required. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / not unexpectedly, cov- is mutating, albeit at a relatively slower rate than influenza viruses; i.e., two to six fold slower over a given time frame (manzanares-meza and medina-contreras, ). ongoing studies indicate that it has developed a number of mutations of which have been associated with the rbd (chen et al., ; wang et al., ). furthermore, out of mutations are in the receptor- binding motif (rbm), i.e., the region of rbd that is in direct contact with ace , indicating that the virus may be accumulating mutations in that region to improve its interaction with ace (li et al., ). fortunately, while these and other mutations appear to have evolved for greater transmissibility, they have not resulted in greater pathogenicity. the variant that has recently received much attention is “vui- / ,” the one first reported in southeast england that presents with multiple amino acid changes to the spike protein (https://virological.org/t/preliminary-genomic-characterisation-of-an-emergent-sars- cov- -lineage-in-the-uk-defined-by-a-novel-set-of-spike-mutations/ ) while not confirmed yet in animal experiments, early reports suggest that it may be > % more infectious than the parent strain. of particular importance to us is the asparagine to tyrosine conversion in position (n y), as this is one of the contact residues within the rbm that plays a role in binding to ace . as shown in figure s , the bound hbd- monomer and dimer are on average not close to the sidechain site of residue (> a between nearest atoms). furthermore, compared to the ring-ring (pi-pi) contact between residue side- chains, which is highly probable between the uk-mutant rbd and ace , stabilizing the interaction as shown by a deep mutagenesis study with the n y mutation enhancing binding (starr et al., ), neither a sidechain ring or positively charged sidechain of hbd- appears to come near in our models of its complex with the (original) rbd. at the same time it should be noted that the interaction of the rbd with ace , and especially with hbd- , is considerably dynamic (zhang et al., ; zhang and buck, ). although this has not yet been measured in the rbd:ace or rbd:hbd- platforms, the entropy of the interaction is likely to be not as unfavorable as seen in complexes where one or both partner proteins have to become significantly rigid. it is now becoming clear that many protein-protein complexes are inherently dynamic (zhang et al., ; zhang and buck, ), thus minimizing the unfavorable entropy change that would otherwise occur on binding. this is especially important for the binding of peptides, which may be relatively unstructured in solution and suggests that design of hbd- and ll- derived peptides would be a fruitful endeavor. while vaccines against sars-cov- have recently been approved by the fda and are planned for distribution and administration in a large scale to cover most of the american population over the next year, we see the amp strategy as complementary to vaccines. while the cov- vaccines appear to show > % efficacy, there will certainly be some degree of morbidity and mortality, as seen in all vaccines (kaselitz et al., ), many people will refuse vaccination (pogue et al., ; schwarzinger et al., ) and a significant number will either fail to mount effective neutralizing antibodies or high enough titers (goodwin et al., ; ndifon et al., ; ovsyannikova et al., ). many of these low or non- responders are predicted to be in the covid- high-risk population. additionally, vaccines more than likely will provide protection for a limited amount of time, as neutralizing antibodies wane, and many people could face reinfection. because of the multiple advantages of using small peptides like hbd- and their derived smaller mimetics, such as high specificity, low toxicity, lack of immunogenicity, low cost of production and ease of administration, they possess the potential for both safety and efficacy. molecules such as hbd- could be delivered, in the future, intra-orally and/or intra-nasally as prophylactic aerosols, in early stages of infection, when telltale symptoms appear and in combinatorial therapeutic approaches for more severe situations. acknowledgements: we thank energy center (cesr) of tennessee technological university for partially supporting graduate student jackson penfield and the pilot fund from drs. weinberg and buck for undergraduate student ann brewer. the simulations were mainly done on ohio supercomputer center pitzer machines, and partly on high performance computers in tennessee technological university. we thank dr. jesse bloom, fred hutchinson cancer center for kindly proving the plasmids to generate spike pseudovirus and hek .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / t cells expressing ace receptor, and dr. parvesh shrestha of the buck lab, for help with mst experiments. dr. buck is currently funded by nih r grant from the national eye institute r ey and his part of the project was also supported by pilot grant from the department of physiology and biophysics of case western reserve university. dr. ramakrishnan is supported by nih/niaid grants r ai and r ai , nih/nci grant r ca and a pilot funding from nord family foundation for covid related research. dr. weinberg was supported by pilot funds from the department of biological sciences of the school of dental medicine, cwru. author contributions: conceptualization, lz, skg, pr, mb, aw. methodology, lz, skg, pr, mb. investigation, lz, scb, jm, jp, ab, skg, pr. writing – original draft, lz, skg, pr, aw. writing – review & editing, skg, pr, lz, mb, aw. visualization, skg, mb, pr, aw. supervision, lz, pr, mb, aw, project administration, aw. funding acquisition, lz, mb, pr, aw. declaration of interests: none to declare methods: resource availability further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, aaron weinberg (axw @case.edu). materials availability this study did not generate new unique reagents. cells hek t and hek t cells stably expressing ace receptor (ace hek t) were cultured in dmem media containing % fbs, u/ml penicillin/streptomycin and mm l-glutamine. plasmids phage-cmv-luc -ires-zsgreen-w, hdm-hgpm , hdm-tat b, prc-cmv-rev b, and sars-cov- spike-alayt plasmids were previously described (crawford et al., ). flag and his tagged rbd were expressed from a pcdna vector with leader sequence and leucine zipper as previously described (ramakrishnan et al., ). structure information the structure of human beta defensin (hbd- ) in the monomer and dimer form is available in the pdp with id fd (hoover et al., ). the hbd- sequence is residues long: gigdpvtclksgaichpvfcprrykqigtcglpgtkcckkp. the five boldened residues were found to form hydrogen bonds with the rbd during the simulations (see below/main paper). the structure of the rbd domain of the spike protein is also available in complex with ace at . Å resolution in the pdb with id m j (lan et al., ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / method details docking and all-atom simulations two kinds of docking programs were applied; one was cluspro (kozakov et al., ; kozakov et al., ; porter et al., ; vajda et al., ), while the other was haddock (dominguez et al., ; van zundert et al., ). the x-ray structures of hbd- and of the sars-cov- s-protein rbd were uploaded to the cluspro docking webserver without additional preparation. the best docked structures were clustered, with most of them showing that the hbd- binds to the rbd at sites used for the association between ace and rbd. the best structure was selected based on the docking programs’ score and the predicted binding sites between hbd- and rbd. cluspro is a rigid body protein docking method. it is based on a fast fourier transform correlation approach, which makes it feasible to generate and evaluate billions of docked conformations by simple scoring functions as shown in equation ( ). it is an implementation of a multistage protocol: rigid body docking (used piper), an energy based filtering, ranking the retained structures based on clustering properties, and finally, the refinement of a limited number of structures by energy minimization. in the cluspro docking, the piper interaction energy is calculated using the following equation: e= . erep- . eatt+ eelec+ . edars ( ) here, erep and eatt are contributions of the van der waals interaction energy, and eelec is an electrostatic energy term. edars is a pairwise structure-based potential constructed by the decoys as the reference state (dars) method (chuang et al., ). it primarily represents a desolvation contribution, i.e., the free energy change due to the removal of the water molecules from the interface (kozakov et al., ). since in the piper calculation, the entropic term was not included in cluspro docking, the piper energy result should not be used to rank clusters. instead, the population of clusters was applied to rank the clusters. in our simulations, the rbd:hbd- complex structure from the top cluster was taken and continued with all-atom molecular dynamics simulations. in the haddock docking, since the binding interface between the ace receptor and rbd are known, residues from to on the rbd were selected as the target binding sites, while the entire hbd- peptide taken as a potential binding site. default values for all other parameters were applied. after that, the best structures, by haddock scoring, were selected. based on the best (including above from haddock and from cluspro docking) structures predicted above, all-atom molecular dynamics simulations were set up using the charmm m (huang et al., ) forcefield and vmd program (humphrey et al., ). one of the deprotonated states of histidine was used (denoted hsd), and the native disulfide bonding in the hbd- was set up. after solvating the protein with an equilibrated box of tip p water molecules, the closest distance between atoms on the proteins and the edge of simulation box is Å. the equivalent of . m in na and cl ions was added into the box plus several ions to neutralize the net charge of the system. the desired temperature is k and pressure is atm, using standard thermo- and barostats. after a brief energy minimization using the conjugate gradient and line search algorithm, ps of dynamics was run at k, and then the system was brought up to k over an equilibration period of ns using namd program version . (phillips et al., ). this was followed by trajectories that continued for up to or ns at atm and k using the npt ensemble. as a comparison, we also simulated the rbd bound with ace using the structures from (lan et al., ) and the same method as above. hbd- can also form a non-covalent dimer at high concentration in solution (hoover et al., ) (with pdb id of fd ). the initial bound structure of the hbd- dimer .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / with the rbd was predicted using targeted haddock docking. the best structure predicted was used in all-atom md simulations as detailed above. the simulation systems, set up, the number of atoms and box size information are shown in table s . to analyze the trajectories, the root mean square deviation (rmsd) and fluctuations (rmsf) of the proteins were calculated using the vmd program and an in-house analysis script based on the coordinates of the backbone ca atoms after aligning the trajectories respectively, to the original crystal structure of the rbd, hbd- , and to the initial complex structure of the rbd and hbd- predicted from docking. the buried surface area (bsa) for the complex was calculated in two steps using the vmd program and a script using the richards and lee method with the water probe size of . Å (lee and richards, ). first, the total solvent accessible surface area of the complex (asacomplex) was calculated based on the complex’s trajectory. second, the accessible surface area of each protein in the complex (asarbd, asahbd ) was calculated for each protein individually. then, the bsa is calculated using equation ( ): bsa= . *(asarbd + asahbd – asacomplex) ( ) the number of hydrogen bonds between the rbd and ace or the rbd and hbd- were calculated using the vmd program with the heavy atom distance cutoff of . Å and the angle cutoff of degrees deviation from h-bond linearity. the time a particular h-bond is formed over the course of the simulation is monitored and is expressed as % occupancy. in order to find out the residues on the binding interface, the closest distance between every residue atom (including hydrogen) between the rbd and hbd- was calculated and averaged over the trajectory run. the average distances between each residue on rbd and on hbd- are shaded by proximity on a red to white color-scale and were used to build the distance maps. furthermore, based on the long term simulation trajectories of the complexes of supplementary table s , the total pairwise interaction energy was calculated using the mm-gbsa method (genheden and ryde, ) by applying namd and the namd energy plugin of the vmd program(humphrey et al., ). this interaction energy ( e_binding ) is calculated using equation ( ): e_binding=-- ( ) e_complex is the potential energy of protein-ligand complex, e_protein is the potential energy of protein, and e_ligand is the potential energy of ligand. < > is the ensemble average over simulation time. in the mm-gbsa method, the solvent effect was counted using the generalized born implicit solvent model (gbis)(tanner et al., ). measurement rbd:ace association in vitro untagged hbd- and n-terminally his-tagged rbd were purchased from peprotech, inc. and raybiotech inc., respectively. binding experiments were carried out with a monolith nt. microscale thermophoresis (mst) instrument (nanotemper, inc.) at room temperature in ph . phosphate buffer saline with . % tween- (pbs-t . %). the rbd was labeled using the nanotemper monolith his- tag labeling kit red-trisnta which labels his-tags with a fluorescent group. nm of this rbd was mixed with a serial dilution of unlabeled hbd- in . ml micro reaction tubes (nanotemper, inc.) and then transferred to premium capillaries (nanotemper, inc.). the experiment was done with a triplicate set of tubes. microscale thermophoresis monitors the change of the diffusion of proteins/peptides in microscopic temperature gradients upon protein binding. the dissociation constant kd was obtained by fitting the binding curve with the quadratic solution for the fraction of fluorescent molecules that formed the complex between proteins a and t, calculated from the law of mass action kd = [a]*[t]/[at] where [a] is the concentration of free fluorescent molecule and [t] the concentration of free titrant and [at] the concentration of complex of a and t. we also carried out the experiment with a labeled rbd as well as .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hbd- sample which had its disulphide bonds reduced by addition of . mm dtt, showing that disulphide bonds are essential for maintaining the folded structures (hati and bhattacharyya, ) and that these are required for the reasonably strong protein-protein interactions (figure a). elisa based assay µl of rhbd- (peprotech, inc.) (concentration as indicated in figure b) in assay diluent buffer (r&d system), were incubated in an rbd coated plate (ray biotech, inc.) at c for hrs. plates were then washed times with µl of wash buffer (r&d systems, inc.) followed by incubation with µl of biotinylated anti hbd- (peprotech, inc.)[ . µg/ml] for hr. plates were then washed again as stated above, incubated with µl of streptavidin-hrp (r&d system, inc.) for minutes. signal was developed using tmb substrate and measured at nm using a microplate reader. immunoprecipitation and western blotting to study interaction between hbd- and his-rbd, recombinant hbd- (peprotech, inc.) with or without recombinant his-tagged-rbd (sino biologicals, inc.) were pre-incubated at room temperature for h in binding buffer ( mm hepes ph . , mm mgcl , mm nacl, . mm dithiothreitol, % triton x- and mm edta) and then incubated with washed ni-nta agarose resin beads ( µl) overnight at °c. beads were collected by centrifugation at rpm for min and washed thrice with binding buffer. beads were boiled with µl of laemmli sample buffer and were analyzed by western blotting (wb). briefly, samples were separated on % sds-polyacrylamide gels and proteins were then transferred to nitrocellulose membrane ( . µm pore size) at v for min in cold. membranes were blocked with % milk in tbst and then probed with goat anti-human bd antibody ( . µg/ml; peprotech), followed by secondary antibody ( : ) at room temperature, and visualized by enhanced chemiluminescence. to study the ability of hbd to compete with rbd binding to ace , ace hek t cells were seeded in cm plates. at % confluency, media was replaced with conditioned media from hek t cells transfected with secreted flag rbd plasmid or control media in the presence or absence of hbd ( . and . µg/ml) and incubated at °c for min. cells were washed and collected in pbs-edta solution and then lysed in triton lysis buffer. lysates were centrifuged at g for min at °c, and immunoprecipitated using m flag beads (sigma) for hours at °c. beads were collected, washed, and boiled with laemmli sample buffer and analyzed by western blotting. cov- spike-pseudotyped luciferase assay pseudotyped sars-cov- spike virus was generation and luciferase assay was carried out as described previously (crawford et al., ). briefly, hek t cells were transfected with luciferase-ires- zsgreen, hdm-hgpm , hdm-tat b, prc-cmv-rev b, and sars-cov- spike-alayt plasmids as described (crawford et al., ) culture supernatants were harvested hours after transfection and used to infect ace hek t cells. to study the effect of hbd on spike pseudotyped virus entry, ace hek t cells were incubated with pseudovirions and varying concentration of hbd ( - µg/ml) for hours. cells were lysed and luminescence was measured using luciferase assay system following manufacturer’s instructions (promega, inc.) in spectramax i microplate detection platform (molecular devices, inc.). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . molecular dynamics simulations of rbd:ace (as a reference) show protein complex is stable. (a) rmsf of rbd (left) and ace (right) in the complex over ns in comparison with values for the unbound (free) proteins; the secondary structure of ace and rbd are indicated. (b) difference in rmsf between bound and free proteins. the data are mapped to the cartoon representation of the complex with color bar (bottom) indicating the range of - . Å (in blue) to . Å (in red) (c) number of hydrogen bonds for the rbd bound to ace over the course of the simulation. (d) table of most prominent h-bonds and their occupancy .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . cartoon representation of rbd:hbd- . (a) comparison of the initial and last structure after ns simulation (shown in cyan for hbd- and green for rbd and shown in magenta for hbd- and raspberry for rbd respectively) after ns all- atom md simulations for the rbd:hbd complex (b) rmsd of proteins in the complex and of the complex itself. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . the rbd and hbd- proteins retain considerable dynamics as a complex. (a) rmsf of rbd (left) and hbd- (right) in the complex over ns in comparison with values for the unbound (free) proteins; the secondary structure of ace and rbd are indicated (b) difference in rmsf between bound and free proteins. the data are mapped to the cartoon representation of the complex with color bar (bottom) indicating the range of - . Å (in blue) to . Å (in red) (c) number of hydrogen bonds for the rbd bound to hbd- over the simulation. (d) table of most prominent h-bonds and their occupancy. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . similar regions/residues are involved in rbd contact with ace as with hbd- . (a) distance map of inter-protein contacts in (a) the rbd:ace complex and (b) the rbd:hbd- complex with distances color coded by average proximity over the length of the simulations (see color scale, right). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . biophysical and biological assays demonstrating hbd- binding to rbd. (a) concentration dependent binding of recombinant hbd- (rhbd- ) to fluorescently labeled recombinant rbd (rrbd), as measured by miscroscale thermophoresis. hbd- was used under oxidizing (black data points) and under reducing conditions (red). (b) functional elisa assay showing that rhbd- binds to immobilized rrbd with a linear range of concentrations ( . to nm). (c) recombinant his-rbd ( µg) and hbd- ( . µg) were incubated as described in methods and precipitated with ni-nta beads to pulldown his-tagged-rbd. co-precipitation of hbd- was assessed by western blotting. lane shows % input of hbd- and lane shows ni-nta precipitation to examine background binding of hbd- to the beads. data is representative of three independent experiments. (d) ace hek t cells were incubated with flag-rbd, with and without hbd- at indicated concentrations. anti-flag immunoprecipitation was performed to precipitate ace bound to flag-rbd and to assess the effect on hbd- addition of rbd:ace binding. data is representative of two biological replicates. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . hbd inhibits cov- spike-pseudotyped virus entry into ace t cells. (a) ace hek t cells were infected with cov- spike-pseudotyped virus and luciferase activity was assessed at hours post infection. (b) effect of hbd on cov- spike-pseudotyped virus cell entry was assessed as in a. (c) percentage infection was calculated from the rlu values in (b) taking spike alone group as %. (d) effect of hbd on vsvg-pseudotyped virus entry was assessed as in (a). (e) percentage infection was calculated from the rlu values in (d) taking vsvg alone group as %. (f) titration of hbd concentration ( - µg/ml) on spike-mediated pseudovirus entry and luciferase activity. (g) percentage of spike infection was calculated from the rlu values in (f) taking spike alone group as %. (h) hbd -mediated percent inhibition of spike- viral entry and ic was calculated by plotting hbd concentration (in µm) against % inhibition observed. values given are mean ± sem of two independent experiments done in triplicates. ***p < . , **p < . , *p < . , and ns (non-significant) against cov- spike-pseudotyped virus alone treated group. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references: alcayaga-miranda, f., cuenca, j., and khoury, m. ( ). antimicrobial activity of mesenchymal stem cells: current status and new perspectives of antimicrobial peptide-based therapies. front immunol , . amaro, r.e., and mulholland, a.j. ( ). biomolecular simulations in the time of covid , and after. comput sci eng , - . barros, e.p., casalino, l., gaieb, z., dommer, a.c., wang, y., fallon, l., raguette, l., belfon, k., simmerling, c., and amaro, r.e. ( ). the flexibility of ace in the context of sars-cov- infection. biophys j. bensch, k.w., raida, m., mägert, h.j., schulz-knappe, p., and forssmann, w.g. ( ). hbd- : a novel beta- defensin from human plasma. febs lett , - . brielle, e.s., schneidman-duhovny, d., and linial, m. ( ). the sars-cov- exerts a distinctive strategy for interacting with the ace human receptor. viruses . cantuti-castelvetri, l., ojha, r., pedro, l.d., djannatian, m., franz, j., kuivanen, s., van der meer, f., kallio, k., kaya, t., anastasina, m., et al. ( ). neuropilin- facilitates sars-cov- cell entry and infectivity. science, eabd . chen, j., wang, r., wang, m., and wei, g.w. ( ). mutations strengthened sars-cov- infectivity. j mol biol , - . chow, l., johnson, v., impastato, r., coy, j., strumpf, a., and dow, s. ( ). antibacterial activity of human mesenchymal stem cells mediated directly by constitutively secreted factors and indirectly by activation of innate immune effector cells. stem cells transl med , - . chuang, g.y., kozakov, d., brenke, r., comeau, s.r., and vajda, s. ( ). dars (decoys as the reference state) potentials for protein-protein docking. biophys j , - . crawford, k.h.d., eguia, r., dingens, a.s., loes, a.n., malone, k.d., wolf, c.r., chu, h.y., tortorici, m.a., veesler, d., murphy, m., et al. ( ). protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays. viruses . daly, j.l., simonetti, b., klein, k., chen, k.-e., williamson, m.k., antón-plágaro, c., shoemark, d.k., simón-gracia, l., bauer, m., hollandi, r., et al. ( ). neuropilin- is a host factor for sars-cov- infection. science, eabd . dawgul, m.a., greber, k.e., sawicki, w., and kamysz, w. ( ). human host defense peptides - role in maintaining human homeostasis and pathological processes. curr med chem. diamond, g., beckloff, n., and ryan, l.k. ( ). host defense peptides in the oral cavity and the lung: similarities and differences. j dent res , - . diamond, g., beckloff, n., weinberg, a., and kisich, k.o. ( ). the roles of antimicrobial peptides in innate host defense. curr pharm des , - . diamond, g., and ryan, l. ( ). beta-defensins: what are they really doing in the oral cavity? oral dis , - . dominguez, c., boelens, r., and bonvin, a.m. ( ). haddock: a protein-protein docking approach based on biochemical or biophysical information. j am chem soc , - . doss, m., white, m.r., tecle, t., and hartshorn, k.l. ( ). human defensins and ll- in mucosal immunity. j leukoc biol , - . genheden, s., and ryde, u. ( ). the mm/pbsa and mm/gbsa methods to estimate ligand-binding affinities. expert opin drug discov , - . ghorbani, m., brooks, b.r., and klauda, j.b. ( ). critical sequence hotspots for binding of novel coronavirus to angiotensin converter enzyme as evaluated by molecular simulations. j phys chem b , - . ghosh, s.k., gerken, t.a., schneider, k.m., feng, z., mccormick, t.s., and weinberg, a. ( ). quantification of human beta-defensin- and - in body fluids: application for studies of innate immunity. clin chem , - . goodwin, k., viboud, c., and simonsen, l. ( ). antibody response to influenza vaccination in the elderly: a quantitative review. vaccine , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / gross, l.z.f., sacerdoti, m., piiper, a., zeuzem, s., leroux, a.e., and biondi, r.m. ( ). ace , the receptor that enables infection by sars-cov- : biochemistry, structure, allostery and evaluation of the potential development of ace modulators. chemmedchem , - . harder, j., bartels, j., christophers, e., and schroder, j.m. ( ). isolation and characterization of human beta - defensin- , a novel human inducible peptide antibiotic. j biol chem , - . harder, j., bartels, j., christophers, e., and schröder, j.m. ( ). a peptide antibiotic from human skin. nature , . hati, s., and bhattacharyya, s. ( ). impact of thiol–disulfide balance on the binding of covid- spike protein with angiotensin-converting enzyme receptor. acs omega , - . herrera, r., morris, m., rosbe, k., feng, z., weinberg, a., and tugizov, s. ( ). human beta-defensins and - cointernalize with human immunodeficiency virus via heparan sulfate proteoglycans and reduce infectivity of intracellular virions in tonsil epithelial cells. virology , - . hoover, d.m., rajashankar, k.r., blumenthal, r., puri, a., oppenheim, j.j., chertov, o., and lubkowski, j. ( ). the structure of human beta-defensin- shows evidence of higher order oligomerization. j biol chem , - . huang, j., rauscher, s., nawrocki, g., ran, t., feig, m., de groot, b.l., grubmüller, h., and mackerell, a.d., jr. ( ). charmm m: an improved force field for folded and intrinsically disordered proteins. nat methods , - . humphrey, w., dalke, a., and schulten, k. ( ). vmd: visual molecular dynamics. j mol graph , - , - . joly, s., maze, c., mccray, p.b., jr., and guthmiller, j.m. ( ). human beta-defensins and demonstrate strain- selective activity against oral microorganisms. j clin microbiol , - . kaselitz, t.b., martin, e.t., power, l.e., and cinti, s. ( ). impact of vaccination on morbidity and mortality in adults hospitalized with influenza a, – . infectious diseases in clinical practice , - . khurshid, z., naseem, m., yahya, i.a.f., mali, m., sannam khan, r., sahibzada, h.a., zafar, m.s., faraz moin, s., and khan, e. ( ). significance and diagnostic role of antimicrobial cathelicidins (ll- ) peptides in oral health. biomolecules . kim, j., yang, y.l., jang, s.h., and jang, y.s. ( ). human β-defensin plays a regulatory role in innate antiviral immunity and is capable of potentiating the induction of antigen-specific immunity. virol j , . kim, y.i., kim, s.g., kim, s.m., kim, e.h., park, s.j., yu, k.m., chang, j.h., kim, e.j., lee, s., casel, m.a.b., et al. ( ). infection and rapid transmission of sars-cov- in ferrets. cell host microbe , - .e . koeninger, l., armbruster, n.s., brinch, k.s., kjaerulf, s., andersen, b., langnau, c., autenrieth, s.e., schneidawind, d., stange, e.f., malek, n.p., et al. ( ). human β-defensin mediated immune modulation as treatment for experimental colitis. front immunol , . kota, s., sabbah, a., chang, t.h., harnack, r., xiang, y., meng, x., and bose, s. ( ). role of human beta-defensin- during tumor necrosis factor-alpha/nf-kappab-mediated innate antiviral response against human respiratory syncytial virus. j biol chem , - . kozakov, d., brenke, r., comeau, s.r., and vajda, s. ( ). piper: an fft-based protein docking program with pairwise potentials. proteins , - . kozakov, d., hall, d.r., xia, b., porter, k.a., padhorny, d., yueh, c., beglov, d., and vajda, s. ( ). the cluspro web server for protein-protein docking. nat protoc , - . krasnodembskaya, a., song, y., fang, x., gupta, n., serikov, v., lee, j.w., and matthay, m.a. ( ). antibacterial effect of human mesenchymal stem cells is mediated in part from secretion of the antimicrobial peptide ll- . stem cells , - . lan, j., ge, j., yu, j., shan, s., zhou, h., fan, s., zhang, q., shi, x., wang, q., zhang, l., et al. ( ). structure of the sars-cov- spike receptor-binding domain bound to the ace receptor. nature , - . lee, b., and richards, f.m. ( ). the interpretation of protein structures: estimation of static accessibility. j mol biol , - . lee, s.h., kim, j.e., lim, h.h., lee, h.m., and choi, j.o. ( ). antimicrobial defensin peptides of the human nasal mucosa. ann otol rhinol laryngol , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / leikina, e., delanoe-ayari, h., melikov, k., cho, m.s., chen, a., waring, a.j., wang, w., xie, y., loo, j.a., lehrer, r.i., et al. ( ). carbohydrate-binding molecules inhibit viral fusion and entry by crosslinking membrane glycoproteins. nat immunol , - . lemessurier, k.s., lin, y., mccullers, j.a., and samarasinghe, a.e. ( ). antimicrobial peptides alter early immune response to influenza a virus infection in c bl/ mice. antiviral res , - . lever, a.m., strappe, p.m., and zhao, j. ( ). lentiviral vectors. j biomed sci , - . li, q., wu, j., nie, j., zhang, l., hao, h., liu, s., zhao, c., zhang, q., liu, h., nie, l., et al. ( ). the impact of mutations in sars-cov- spike on viral infectivity and antigenicity. cell , - .e . lokhande, k.b.b., tanushree; swamy, k. venkateswara; deshpande, manisha ( ). an in silico scientific basis for ll- as a therapeutic and vitamin d as preventive for covid- . chemrxiv. malik, a., prahlad, d., kulkarni, n., and kayal, a. ( ). interfacial water molecules make rbd of spike protein and human ace to stick together. biorxiv, . . . . manzanares-meza, l.d., and medina-contreras, o. ( ). sars-cov- and influenza: a comparative overview and treatment implications. bol med hosp infant mex , - . mathews, m., jia, h.p., guthmiller, j.m., losh, g., graham, s., johnson, g.k., tack, b.f., and mccray, p.b., jr. ( ). production of beta-defensin antimicrobial peptides by the oral mucosa and salivary glands. infect immun , - . mccallum, m., walls, a.c., bowen, j.e., corti, d., and veesler, d. ( ). structure-guided covalent stabilization of coronavirus spike glycoprotein trimers in the closed conformation. nat struct mol biol. mi, b., liu, j., liu, y., hu, l., liu, y., panayi, a.c., zhou, w., and liu, g. ( ). the designer antimicrobial peptide a-hbd- facilitates skin wound healing by stimulating keratinocyte migration and proliferation. cell physiol biochem , - . moll, g., drzeniek, n., kamhieh-milz, j., geissler, s., volk, h.d., and reinke, p. ( ). msc therapies for covid- : importance of patient coagulopathy, thromboprophylaxis, cell product quality and mode of delivery for treatment safety and efficacy. front immunol , . mookherjee, n., anderson, m.a., haagsman, h.p., and davidson, d.j. ( ). antimicrobial host defence peptides: functions and clinical potential. nature reviews drug discovery , - . mulder, k.c., lima, l.a., miranda, v.j., dias, s.c., and franco, o.l. ( ). current scenario of peptide-based drugs: the key roles of cationic antitumor and antiviral peptides. front microbiol , . ndifon, w., wingreen, n.s., and levin, s.a. ( ). differential neutralization efficiency of hemagglutinin epitopes, antibody interference, and the design of influenza vaccines. proc natl acad sci u s a , - . ooi, c.y., pang, t., leach, s.t., katz, t., day, a.s., and jaffe, a. ( ). fecal human β-defensin in children with cystic fibrosis: is there a diminished intestinal innate immune response? dig dis sci , - . ovsyannikova, i.g., schaid, d.j., larrabee, b.r., haralambieva, i.h., kennedy, r.b., and poland, g.a. ( ). a large population-based association study between hla and kir genotypes and measles vaccine antibody responses. plos one , e . phillips, j.c., braun, r., wang, w., gumbart, j., tajkhorshid, e., villa, e., chipot, c., skeel, r.d., kalé, l., and schulten, k. ( ). scalable molecular dynamics with namd. j comput chem , - . pinkerton, j.w., kim, r.y., koeninger, l., armbruster, n.s., hansbro, n.g., brown, a.c., jayaraman, r., shen, s., malek, n., cooper, m.a., et al. ( ). human β-defensin- suppresses key features of asthma in murine models of allergic airways disease. clin exp allergy. pogue, k., jensen, j.l., stancil, c.k., ferguson, d.g., hughes, s.j., mello, e.j., burgess, r., berges, b.k., quaye, a., and poole, b.d. ( ). influences on attitudes regarding potential covid- vaccination in the united states. vaccines (basel) . porter, k.a., xia, b., beglov, d., bohnuud, t., alam, n., schueler-furman, o., and kozakov, d. ( ). cluspro peptidock: efficient global docking of peptide recognition motifs using fft. bioinformatics , - . quiñones-mateu, m.e., lederman, m.m., feng, z., chakraborty, b., weber, j., rangel, h.r., marotta, m.l., mirza, m., jiang, b., kiser, p., et al. ( ). human epithelial beta-defensins and inhibit hiv- replication. aids , f - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ramakrishnan, p., wang, w., and wallach, d. ( ). receptor-specific signaling for both the alternative and the canonical nf-kappab activation pathways by nf-kappab-inducing kinase. immunity , - . ray, d., le, l., and andricioaei, i. ( ). distant residues modulate the conformational opening in sars-cov- spike protein. biorxiv, . . . . rivas-santiago, b., schwander, s.k., sarabia, c., diamond, g., klein-patel, m.e., hernandez-pando, r., ellner, j.j., and sada, e. ( ). human {beta}-defensin is expressed and associated with mycobacterium tuberculosis during infection of human alveolar epithelial cells. infect immun , - . roth, a., lütke, s., meinberger, d., hermes, g., sengle, g., koch, m., streichert, t., and klatt, a.r. ( ). ll- fights sars-cov- : the vitamin d-inducible peptide ll- inhibits binding of sars-cov- spike protein to its cellular receptor angiotensin converting enzyme in vitro. biorxiv, . . . . roy, s., bag, a.k., singh, r.k., talmadge, j.e., batra, s.k., and datta, k. ( ). multifaceted role of neuropilins in the immune system: potential targets for immunotherapy. front immunol , . ryan, l.k., dai, j., yin, z., megjugorac, n., uhlhorn, v., yim, s., schwartz, k.d., abrahams, j.m., diamond, g., and fitzgerald-bocarsly, p. ( ). modulation of human beta-defensin- (hbd- ) in plasmacytoid dendritic cells (pdc), monocytes, and epithelial cells by influenza virus, herpes simplex virus, and sendai virus and its possible role in innate immunity. j leukoc biol , - . sakamoto, n., mukae, h., fujii, t., ishii, h., yoshioka, s., kakugawa, t., sugiyama, k., mizuta, y., kadota, j., nakazato, m., et al. ( ). differential effects of alpha- and beta-defensin on cytokine production by cultured human bronchial epithelial cells. am j physiol lung cell mol physiol , l - . sawai, m.v., jia, h.p., liu, l., aseyev, v., wiencek, j.m., mccray, p.b., jr., ganz, t., kearney, w.r., and tack, b.f. ( ). the nmr structure of human beta-defensin- reveals a novel alpha-helical segment. biochemistry , - . schibli, d.j., hunter, h.n., aseyev, v., starner, t.d., wiencek, j.m., mccray, p.b., jr., tack, b.f., and vogel, h.j. ( ). the solution structures of the human beta-defensins lead to a better understanding of the potent bactericidal activity of hbd against staphylococcus aureus. j biol chem , - . schwarzinger, m., flicoteaux, r., cortarenoda, s., obadia, y., and moatti, j.p. ( ). low acceptability of a/h n pandemic vaccination in french adult population: did public health policy fuel public dissonance? plos one , e . semple, f., and dorin, j.r. ( ). β-defensins: multifunctional modulators of infection, inflammation and more? j innate immun , - . singh, p.k., jia, h.p., wiles, k., hesselberth, j., liu, l., conway, b.a., greenberg, e.p., valore, e.v., welsh, m.j., ganz, t., et al. ( ). production of beta-defensins by human airway epithelia. proc natl acad sci u s a , - . siu, y.l., teoh, k.t., lo, j., chan, c.m., kien, f., escriou, n., tsao, s.w., nicholls, j.m., altmeyer, r., peiris, j.s., et al. ( ). the m, e, and n structural proteins of the severe acute respiratory syndrome coronavirus are required for efficient assembly, trafficking, and release of virus-like particles. j virol , - . spinello, a., saltalamacchia, a., and magistrato, a. ( ). is the rigidity of sars-cov- spike receptor-binding motif the hallmark for its enhanced infectivity? insights from all-atom simulations. j phys chem lett , - . starr, t.n., greaney, a.j., hilton, s.k., ellis, d., crawford, k.h.d., dingens, a.s., navarro, m.j., bowen, j.e., tortorici, m.a., walls, a.c., et al. ( ). deep mutational scanning of sars-cov- receptor binding domain reveals constraints on folding and ace binding. cell , - .e . sutton, m.t., fletcher, d., ghosh, s.k., weinberg, a., van heeckeren, r., kaur, s., sadeghi, z., hijaz, a., reese, j., lazarus, h.m., et al. ( ). antimicrobial properties of mesenchymal stem cells: therapeutic potential for cystic fibrosis infection, and treatment. stem cells int , . tai, w., he, l., zhang, x., pu, j., voronin, d., jiang, s., zhou, y., and du, l. ( ). characterization of the receptor- binding domain (rbd) of novel coronavirus: implication for development of rbd protein as a viral attachment inhibitor and vaccine. cell mol immunol , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / tanner, d.e., chan, k.y., phillips, j.c., and schulten, k. ( ). parallel generalized born implicit solvent calculations with namd. j chem theory comput , - . tripathi, s., wang, g., white, m., qi, l., taubenberger, j., and hartshorn, k.l. ( ). antiviral activity of the human cathelicidin, ll- , and derived peptides on seasonal and pandemic influenza a viruses. plos one , e . tsuchiya, a., takeuchi, s., iwasawa, t., kumagai, m., sato, t., motegi, s., ishii, y., koseki, y., tomiyoshi, k., natsui, k., et al. ( ). therapeutic potential of mesenchymal stem cells and their exosomes in severe novel coronavirus disease (covid- ) cases. inflamm regen , . vajda, s., yueh, c., beglov, d., bohnuud, t., mottarella, s.e., xia, b., hall, d.r., and kozakov, d. ( ). new additions to the cluspro server motivated by capri. proteins , - . van zundert, g.c.p., rodrigues, j., trellet, m., schmitz, c., kastritis, p.l., karaca, e., melquiond, a.s.j., van dijk, m., de vries, s.j., and bonvin, a. ( ). the haddock . web server: user-friendly integrative modeling of biomolecular complexes. j mol biol , - . walls, a.c., park, y.j., tortorici, m.a., wall, a., mcguire, a.t., and veesler, d. ( ). structure, function, and antigenicity of the sars-cov- spike glycoprotein. cell , - .e . walls, a.c., tortorici, m.a., bosch, b.j., frenz, b., rottier, p.j.m., dimaio, f., rey, f.a., and veesler, d. ( ). cryo- electron microscopy structure of a coronavirus spike glycoprotein trimer. nature , - . wang, r., hozumi, y., yin, c., and wei, g.w. ( ). decoding sars-cov- transmission and evolution and ramifications for covid- diagnosis, vaccine, and medicine. j chem inf model. warnke, p.h., voss, e., russo, p.a., stephens, s., kleine, m., terheyden, h., and liu, q. ( ). antimicrobial peptide coating of dental implants: biocompatibility assessment of recombinant human beta defensin- for human cells. int j oral maxillofac implants , - . wrapp, d., wang, n., corbett, k.s., goldsmith, j.a., hsieh, c.l., abiona, o., graham, b.s., and mclellan, j.s. ( ). cryo-em structure of the -ncov spike in the prefusion conformation. science , - . xiong, x., qu, k., ciazynska, k.a., hosmillo, m., carter, a.p., ebrahimi, s., ke, z., scheres, s.h.w., bergamaschi, l., grice, g.l., et al. ( ). a thermostable, closed, sars-cov- spike protein trimer. biorxiv, . . . . yan, r., zhang, y., li, y., xia, l., guo, y., and zhou, q. ( ). structural basis for the recognition of sars-cov- by full-length human ace . science , - . yeasmin, r., buck, m., weinberg, a., and zhang, l. ( ). translocation of human β defensin type through a neutrally charged lipid membrane: a free energy study. j phys chem b , - . yoshimoto, f.k. ( ). the proteins of severe acute respiratory syndrome coronavirus- (sars cov- or n- cov ), the cause of covid- . protein j , - . zhang, l., borthakur, s., and buck, m. ( ). dissociation of a dynamic protein complex studied by all-atom molecular simulations. biophys j , - . zhang, l., and buck, m. ( ). molecular dynamics simulations reveal isoform specific contact dynamics between the plexin rho gtpase binding domain (rbd) and small rho gtpases rac and rnd . j phys chem b , - . zhao, h., zhou, j., zhang, k., chu, h., liu, d., poon, v.k., chan, c.c., leung, h.c., fai, n., lin, y.p., et al. ( ). a novel peptide with potent and broad-spectrum antiviral activities against multiple respiratory viruses. sci rep , . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / coordination of phage genome degradation versus host genome protection by a bifunctional restriction-modification enzyme visualized by cryoem coordination of phage genome degradation versus host genome protection by a bifunctional restriction-modification enzyme visualized by cryoem betty w. shen , joel d. quispe , yvette luyten , benjamin e. mcgough , richard d. morgan and barry l. stoddard ,* division of basic sciences fred hutchinson cancer research center fairview ave. n. seattle wa usa department of biochemistry university of washington seattle wa usa new england biolabs county road ipswich, ma usa scientific computing fred hutchinson cancer research center fairview ave. n. seattle, wa usa * corresponding author: bstoddar@fredhutch.org - - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract restriction enzymes that combine dna methylation and cleavage activities into a single polypeptide or protein assemblage and that modify just one dna strand for host protection are capable of more efficient adaptation towards novel target sites. however, they must solve the problem of discrimination between newly replicated and unmodified host sites (needing methylation) and invasive foreign site (needing to lead to cleavage). one solution to this problem might be that the activity that occurs at any given site is dictated by the oligomeric state of the bound enzyme. methylation requires just a single bound site and is relatively slow, while cleavage requires that multiple unmethylated target sites (often found in incoming, foreign dna) be brought together into an enzyme-dna complex to license rapid cleavage. to validate and visualize the basis for such a mechanism, we have determined the catalytic behavior of a bifunctional type iil restriction-modification (‘rm’) enzyme (drdv) and determined its high-resolution structure at several different stages of assembly and coordination with multiple bound dna targets using cryoem. the structures demonstrate a mechanism of cleavage by which an initial dimer is formed between two dna-bound enzyme molecules, positioning the single endonuclease domain from each enzyme against the other’s dna and requiring further oligomerization through differing protein-protein contacts of additional dna-bound enzyme molecules to enable cleavage. the analysis explains how endonuclease activity is licensed by the presence of multiple target-containing dna duplexes and provides a clear view of the assembly through d space of a dna-bound rm enzyme ‘synapse’ that leads to rapid cleavage of foreign dna. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . bacterial restriction-modification (rm) systems are ubiquitous and highly diverse defense mechanisms that guard host cells against invasive dna elements, particularly phage genomes (halford, ; loenen et al., b; roberts, ). rm systems pair two competing enzymatic activities: methylation of adenine or cytosine bases within a target site (which protects the host genome from degradation) versus cleavage of dna within or at some distance from unmethylated copies of the same target site (which leads to degradation of foreign dna). in combination with additional innate or ‘preprogrammed’ restriction mechanisms that also act on both host and invader genomes (such as the pgl (sumby and smith, ), brex (goldfarb et al., ), dnd (xu et al., ) and ssp (xiong et al., ) defense systems) and complementary ‘adaptive’ nuclease systems (typified by reprogrammable crispr-associated nucleases (koonin and makarova, )) rm systems represent an important form of antiviral defense in bacteria. rm systems are loosely divided into at least four major classes, based on their structural composition, biochemical activities and the relationship between their bound dna targets and subsequent cleavage patterns (loenen et al., b). type i and iii rm systems contain atp-dependent translocase domains or subunits that bring together multiple subunits into a dna-bound protein collision complex or synapse, resulting in cleavage either near (type iii) or at some random distance (type i) from their target sites (loenen et al., a; rao et al., ). in contrast, type ii systems do not contain or utilize atp-dependent motors for motion and activity (pingoud et al., ). instead, they rely either on the parallel activities of stand-alone methyltransferase (mtase) and endonuclease (endo) enzymes that independently recognize the same dna target, or on the physical coupling of methylation and endonuclease domains within a single protein chain or a larger multimeric assemblage, so that both functions are simultaneously targeted by a single dna recognition module. type iv endonucleases behave similarly to type ii enzymes, but cleave methylated, rather than unmethylated dna targets (enabling a bacterial response against phage that methylate their own dna target to evade restriction endonuclease activity) (loenen and raleigh, ). rm systems that use a common dna recognition module to simultaneously target their competing dna methylation and cleavage activities have the advantage of facilitating the evolution of new dna specificity (morgan and luyten, ), since any alteration in dna targeting will concurrently alter the specificity of host protective methylation and restrictive cleavage of invading dnas. many such rm systems modify just one dna strand within their asymmetric recognition motif, allowing these systems to employ a single dna recognition module and mtase domain. this presents a distinct challenge, as dna replication produces one daughter dna with no protective methylation. such systems must therefore solve the problem of discrimination between self (which should be methylated and protected from cleavage) versus non-self (which should be rapidly cleaved and degraded). systems that communicate between two or more sites within a dna molecule through d translocation, such as the type iii and type isp systems, solve the problem by requiring sites be in a head to head orientation to license cleavage, since this effectively places methylation in both strands. however, there are numerous type iil systems that do not have a translocase function and cut sites without regard to their orientation. how these avoid self-cutting while maintaining (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . sufficient restriction of invading dnas to provide a selective advantage to the host has been an open question. one reasonable (and frequently postulated) solution to that challenge is to ( ) ensure that cleavage is significantly faster than methylation, while also requiring that ( ) multiple unmethylated dna target sites be brought together into an enzyme-dna complex before cleavage is licensed to occur. as a result, whereas an encounter with foreign dna (typically harboring multiple unprotected sites) would lead to rapid cutting at multiple positions, an encounter between an rm enzyme and one or two unmethylated target site(s) on the host would result in eventual dna methylation and release of bound enzyme. a variety of structures of rm enzymes that combine their methylation and cleavage domains and activities into single protein chains or complexes have been solved in the presence and absence of bound dna. these include: ( ) two single-chain type iil enzymes (mmei (callahan et al., ) and bpusi (shen et al., ), that each contain an n-terminal nuclease domain followed by methyltransferase (mtase) and target recognition domains (trds)). ( ) a pair of single-chain type isp enzymes (llagi and llabiii, that each incorporate an additional reca-like atpase domain into their structures (chand et al., ; Šišáková et al., )). ( ) a crystal structure for a complex of ecop i (a type iii multichain complex containing two mtase subunits and an endonuclease subunit) bound to dna (gupta et al., ). ( ) cryoem structures of the type i enzyme ecor i (a multichain complex containing multiple nuclease- translocase, methyltransferase and specificity subunits) bound to dna (gao et al., ). that recent analysis built upon lower-resolution models of dna-bound ‘m s’ subcomplexes of that same enzyme, as well as those of ecoki and ttei (kennaway et al., ; kennaway et al., ). ( ) a type iv methyl-dependent restriction endonuclease, mspji, in a tetrameric complex of mspji bound to dna (horton et al., ) (although the type iv do not have a mtase domain, the mspji tetrameric complex is relevant to this study). collectively, these analyses have provided considerable insight into the domain organizations, structural dynamics, dna recognition specificity, and (for the type isp llagi and llabiii enzymes) a unique mechanism of translocation and subsequent cleavage of dna (chand et al., ; Šišáková et al., ). however, a high-resolution structure of a multimeric rm enzyme system engaged in simultaneous recognition complexes with multiple dna targets (with the methyltransferase and nuclease domains each properly positioned for competing reaction outcomes) has not yet been described. drdv is a single-chain, type iil restriction-modification enzyme of length residues that recognizes the asymmetric dna target site ’ catgnac ’ and methylates an adenine (bold and underlined) in one strand, leading to host protection. it contains an n-terminal nuclease domain, a helical connector region followed by a methyltransferase domain, and a c-terminal target recognition domain (trd). when bound to its dna target, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . it either methylates the underlined adenine within the target or cleaves the top and bottom strand of foreign dna precisely and basepairs downstream of the target site. in this study, we use cryoem analysis and supporting biochemical experiments to visualize the stepwise formation of a tetrameric assemblage of drdv in complex with independently bound dna target sites. the analysis illustrates the structural basis for generation of an active endonuclease complex of bound enzymes and the basis of crosstalk and cooperativity between multiple copies of the enzyme and bound dna. methods protein expression and purification. the gene encoding the drdv rm system (axg . ) was pcr amplified from deinococcus wulumuqiensis genomic dna using q hot start high-fidelity dna polymerase and cloned into the t expression-based vector psapv (samuelson et al., ) using the nebuilder hifi dna assembly master mix reaction protocol (new england biolabs, ipswich, ma). the plasmid construct was confirmed by dna sequencing of the drdv gene and flanking vector sequence. the verified plasmid construct was transformed and expressed in the e. coli host er (f- l- fhua lacz::t gene [lon] ompt gal attb::(pcd -lysy, laciq) sula r(mcr- ::minitn –tets) [dcm] r(zgb- ::tn –tets) enda d(mcrc-mrr) ::is ). drdv endonuclease was purified from g of cells grown at °c in rich media supplemented with % glycerol and . % glucose and containing µg/ml chloramphenicol. cells were induced at a final concentration of . mm iptg and grown for an additional hours at °c before harvest. cells were resuspended in volumes deae buffer ( mm nacl, mm tris ph , . mm edta, mm dtt, % glycerol), lysed using microfluidics microfluidizer m eh (microfluidics, westwood, ma) and cell debris removed by centrifugation at , xg for min. drdv endonuclease was purified to near homogeneity via four sequential chromatographic steps: deae anion exchange, heparin hyperd, source q, and source s (supplemental figure s , panel a). the clarified lysate ( ml) was first applied to deae ( ml column bed volume, ph . ) and then washed with column volumes ( ml) of deae buffer. the flow-through and wash were pooled ( ml), diluted with no-salt deae buffer ( mm tris ph , . mm edta, mm dtt, % glycerol) to a final nacl concentration of mm, and applied to a heparin hyperd column (ph . ). a salt gradient was run from mm to mm nacl and ml fractions were collected. drdv eluted across fractions to ( ml total volume). those fractions were diluted to mm nacl and applied to a sourceq column, and the protein eluted via a salt gradient while collecting ml fractions. drdv eluted across fractions to ( ml total volume; mg total protein). the fractions were pooled, diluted to mm nacl and applied to a source s column at ph . , and eluted via a salt gradient into ml fractions. drdv eluted across fractions to ( ml total volume). they were pooled and dialyzed into storage buffer ( mm nacl, mm tris ph . , . mm edta, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . mm dtt, % glycerol). milligrams ( . grams) of purified protein was stored at a final concentration of mg/ml. analytical size exclusion chromatography (sec) demonstrated that the purified drdv protein eluted at a volume corresponding to an approximate molecular weight of approximately kilodaltons, corresponding to a monomer in solution. upon incubation, in the presence of calcium, with an equimolar amount of a double stranded dna (dsdna) duplex containing a single enzyme target site ( consisting of a top stranded with sequence ’ -cagcccatggacccagaaccac/ccacc- ’ (underline = target site; “/” = cut site) and its complementary bottom strand with sequence ’- gtcgggtacctgggtcttgg/tgggtgg ’), the protein co-eluted with the dna at a volume corresponding to an approximate molecular weight of kilodaltons, suggesting the formation of a tetrameric enzyme-dna complex (supplemental figure s , panel b). drdv endonuclease and methyltransferase assays. endonuclease activity was assayed in nebuffer ( mm tris-acetate, ph . , mm magnesium acetate, mm potassium acetate, mm dtt) supplemented with µm s-adenosyl-methionine (adomet), typically using µg dna substrate per µl reaction volume at °c. reactions were terminated by adding stop solution containing . % sds (neb gel loading dye, purple) and dna fragments were analyzed by electrophoresis in agarose gels. methyltransferase activity was assayed in the same buffer, supplemented with . mm edta (to remove mg++) and µm adomet. cleavage assays that illustrated the trans-activation of the endonuclease via the addition of dsdna harboring the enzyme’s target site were performed in the presence of an added oligonucleotide (sequence ’- gtgctcaggtccatgagcgagtcttttgactcgctcatggacctgagcactc - ’) that forms a short hairpin double-stranded dna duplex (figure ) containing the catgnac recognition site (top and bottom strands of the target corresponding to the underlined bases in the sequence shown) with basepairs upstream ( ’) an basepairs downstream ( ’) of the target, terminating immediately prior to the site of dna cutting. structural visualization via electron microscopy. the protein-dna complexes were initially evaluated by negative-stained tem (supplemental figure s , panels c, d, e) followed by screening cooling and vitrification conditions and initial data collection using a glacios kv electron microscope (supplemental figure s ). a subsequent data set was collected on a krios electron microscope (supplemental figure s ). all data preprocessing, which include motion correction, ctf estimation, and exposure curation, as well as d particle curations, d model generation/refinement, and post refinement were performed using the software package cryosparc (punjani et al., ). for each movie stack, the frames were aligned for beam-induced motion correction using patch-motion-correction. patch-ctf was used to determine the contrast transfer function parameters. bad movies were eliminated based on a ctf-fit resolution cut off at Å and relative ice thickness of . estimated from the ctf function by cryosparc . different particle picking algorithms, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . including manual pick, template-based and blob picking were employed to the same dataset and results on model distribution were compared. the evaluation of the density map at all stages and initial fitting of the phyre (kelley et al., ) predicted model to the final density map were accomplished in chimera (pettersen et al., ). the final structures were built and refined with program coot (emsley et al., ). i. negative stain transmission electron microscopy (tem). negative-stain grids (supplemental figure s , panel c) were prepared by the application of µl of sec purified samples to a glow discharged uniform carbon film coated grid. the particles were allowed to adsorb to the surface for to seconds. excess solution was wicked away by briefly touching the edge of a filter paper. the grid was quickly washed three times with µl drops of water and once with a drop of µl . % uranyl formate (uf) followed by staining for ~ second with a µl uf. the grids were air-dried for at least hours prior to inspection on an in-house jeol jm microscope (operating at kv) equipped with a gatan rio kx k cmos detector. both dna free drdv and drdv/dna complex distributed homogeneously in random orientations over the surface of the carbon film. a small dataset of micrographs was collected using the automated data collection package leginon (suloway et al., ) from the negative-stained specimen at a pixel size of . Å on a fei tecnai spirit electron microscope (operating at kv) equipped with a gatan k x k ccd detector. initially particles were hand-picked from all micrographs of the negative stained dataset and subjected to reference free d classification. six out of ten of d class averages (supplemental figure s , panel d) from d classification were used to reconstruct a four-loped volume with imperfect two-fold symmetry. homogenous refinement with c or c symmetry yield envelopes at approximately Å and Å resolution at a gold standard fourier shell correlation (gsfsc) of . , respectively. (supplemental figure s , panel e). ii. initial cryoem screening and analyses (see supplemental figure s ). using the same protein-dna complex, cryoem grids were prepared by applying μl of a drdv-dna complex with an absorbance of . od at nm (approximately . mg/ml protein based on quantitative sds-page analysis) to a glow- discharged quantifoil . / . holey carbon film coated copper grid, which was blotted for . s and plunge- frozen in liquid ethane using an fei vitrobot mark iv. screening datasets with a total of movies were collected from two separate grids on a glacios electron-microscope (operating at kv) equipped with a gatan k -summit direct electron detector at a pixel size of . Å. the same six selected d classes of the negative stained particles (supplemental figure s , panel d) were used as templates for automatic templated particle picking from a total of movies after frames were aligned and manual exposure curation. after “inspect particle picks” and “local motion correction”, out of particles were accepted for d classification. after rounds of particle curation, particles from selected classes (out of ) (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . were used for ab-initio reconstruction of one unique d model. this initial d cryoem reconstruction of drdv- dna complex showed an asymmetric particle with three rather than four lobes as shown for the negative- stained particles. which ruled out higher symmetry of c . hence all further refinement processes were performed with c symmetry to avoid biased interpretation of the resulting maps, resulting with a trimer map of . Å at a gsfsc of . between the two half maps. d variability analysis (punjani and fleet, ) of this map showed that the most prominent component arises from the association/dissociation of a third protomer to a dimer core (supplemental movie ). ab-initio d reconstruction with four models revealed the presence of a dimer, a partial trimer with an ill-defined dimer core, and a full-trimer (at . %, . % and . %, respectively) plus a small percentage of smaller fragments ( . %). homogeneous, nonuniform refinement followed by local refinement resulted in a map at . Å for a dimer and a map at . Å for a full trimer (supplemental figure s and figure , panel a). iii. final cryoem data collection (see supplemental figure s ). a final dataset was collected at the pacific northwest center for cryoem (pncc) using a vitrified grid prepared with the complex at a final concentration of ~ . mg/ml protein (diluted immediately before application to the grid from a stock solution at ~ . mg/ml protein) on a quantifoil . / . , mesh copper grid, using a titan krios electron microscope (fei) operating at kv, equipped with a gatan k direct electron detector and an energy filter (operated with a slit width of ev) at a super resolution pixel size of . Å. the data was binned by a factor of two to a pixel size of . Å. after preprocessing (motion correction, ctf estimate and manual exposure curation), micrographs were accepted from a total of movies. an automated ‘blob picker’ algorithm with maximum and minimum diameters of Å and Å was used for particle picking. after inspection and three rounds of particle curation, particles were selected for d reconstruction and refinements. ab- initio d reconstruction with three models showed that the dataset contained three different classes – trimers (~ %), tetramers (~ %), and a higher molecular aggregate (~ %) that could be the result of the addition of extra monomers to the tetramers or the result of close contact of neighboring particles. dimers were absent in the pncc dataset which was prepared from a stock solution at much higher concentration and diluted immediately prior to the preparation of the grids. after homogeneous and non-uniform refinement, the refined tetramer class was further refined after local- and global-ctf refinement of the particles, which led to a final resolution of . Å at a gsfsc of . between the two half maps. d variability display showed that the association-disassociation of a fourth component to the trimer is the most prominent contribution to the d variability, thus the particles under the tetramer class (supplemental figure s , tetramer i) were further classified via a second round of ab-initio d reconstruction with three models, resulting in a full trimer ( . %), a tetramer ( . %) and a class of small fragments ( . %). final refinements of the trimer and tetramer led to a resolution of better than . Å at a gsfsc of . for both d classes (supplemental figure s ). even though the nominal resolution of tetramer i is higher than tetramer ii, the quality of the latter is actually superior, especially the mtase and trd domains and dna of the fourth protomer. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the density maps of all the enzyme-dna complexes, visualized at three unique stages of assembly (dimers, trimers and full tetramers) each allowed unambiguous placement of individual protein chains, each containing all residues with the exception of a short surface exposure loop in the methyltransferase domain (residue – ) (supplemental figure s ), as well as bound copies of the dna duplex, a bound sam cofactor, a base-flipped adenine nucleotide in each methyltransferase active site, and calcium ions associated with each endonuclease domain in contact with a dna strand and scissile phosphate. the map of the tetramers indicated a reduced occupancy for one of the subunits and its bound dna duplex. results and discussion biochemical activity assays. a series of in vitro biochemical analyses (figure ) of drdv activity demonstrate that the drdv enzyme displays mechanistic behavior described above, that is believed to lead to different reaction outcomes against ‘self’ versus ‘foreign’ dna targets: much slower methylation than cleavage, strong activation of cleavage via binding of multiple dna targets, and coordinated, near- simultaneous cleavage of both strands in multiple target sites within a dna substrate. (i) cleavage of unmethylated targets by drdv is significantly faster than the rate of host-protective methylation (figure a). in a series of in vitro incubations with a standard multisite substrate (lambda dna), dna cleavage is nearly complete within to minutes, whereas complete methylation of the same substrate under similar conditions (except for the absence of mg++ to prevent dna cleavage) requires up to hours. (ii) drdv requires multiple sites for efficient, high fidelity cleavage. drdv cleaves a dna substrate containing a single target site (a puc plasmid with a drdv target site added at position ) incompletely, cutting only around % of the dna even with an -fold excess of enzyme (figure b. left). at higher excess enzyme, star activity (cleavage at closely related non-cognate sites) appears as drdv begins to make additional, though very partial, cuts at near-cognate sites. in contrast, cleavage activity towards the same plasmid substrate is significantly increased (and off-target star activity is reduced) by supplying a short dsdna hairpin oligonucleotide that contains the drdv recognition site in trans (figure b, right). maximum stimulation is achieved when the oligo is supplied at a ratio of between : to approximately : to drdv enzyme molecules, with enzyme molecules in excess to the substrate target sites to be cut. (iii) drdv simultaneously cleaves both dna strands, downstream of the target site, in a coordinated manner. drdv digestion of a circular dna substrate (pbr plasmid) containing multiple target sites showed little accumulation of the nicked open circular dna form (figure c), indicating that cleavage events occur in a coordinated reaction with both strands cleaved in a nearly concerted event. the plasmids were initially cut both to linear fragments cut at one site only and to fragments cut at two sites, indicating cleavage can occur at just one site or at two sites in a coordinated manner. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cryoem structural analysis. the purified enzyme eluted with an apparent mass of kd from a final size exclusion column. upon incubation with an equimolar ratio of a dna duplex containing a single copy of its dna target site, the protein co-eluted with the dna over a sharp peak centered at an estimated molecular weight of approximately kd. that sample was used for negative stain em studies and single particle reconstruction, resulting in a molecular envelope corresponding to an asymmetric tetrameric assemblage with a pair of pseudo-orthogonal dyad symmetry axes, of approximate size x x Å (supplemental figure s e). in addition to these largest particles, smaller particles corresponding to intermediate bi- and tri- lobed assemblages were also observed. we interpreted this result as potentially representing a population of enzyme tetramers bound to multiple dna targets, interspersed with smaller dimeric and trimeric enzyme-dna assemblages. the subsequent cryoem single particle reconstructions showed that drdv-dna complexes undergo a concentration dependent oligomerization producing density maps with two, three and four lopes corresponding to dimer, trimer and tetramer and allowed unambiguous placement and subsequent building and refinement of unique enzyme-dna complexes containing two, three or four protein subunits. all particles contain a highly homologous dimeric core with one or two extra protomers at either side of the trimer and tetramer (figure , supplemental figures s and s , supplemental movies). the final maps (individually corresponding to . to . Å resolution) provided well-resolved features that allowed unambiguous modeling of secondary structure elements and corresponding side chain positions across four sequential functional regions and folded domains within each protein subunit (an n-terminal endonuclease domain, an alpha-helical connector, a methyltransferase domain and a c-terminal target recognition domain, or ‘trd’) of the enzyme (figure , and supplemental movie ). the relative local resolution distribution of all three density maps on the same scale and the sequence of a drdv subunit with corresponding secondary structures are shown in supplemental figure s . the description of the enzyme-dna complex features provided below is derived from the density map of the largest observed enzyme assemblage. those points are also observed in the structures of the dimeric and trimeric species (solved and refined independently), with the exception of small conformational changes that appear to accompany the stepwise addition of the third and fourth enzyme subunits. the relative domain orientations within a single dna-bound enzyme subunit and its interactions with its target site (observed in all of the structures) is illustrated in figure . the dna target site (numbered according to their position in the target site, i.e. ‘ - c a t g g a c - ’; figure a) is bound in a cleft between the methyltransferase domain (mtase, residues - ) and target recognition domain (trd, residues - ) (figure b). the adenine base at position (‘a ’) is flipped into the methyltransferase active site and positioned near a bound molecule of s-adenosyl-methionine (‘sam’ or ‘adomet’) (figure d,e). sequence- specific base contacts by enzyme side chains are observed to six (out of the seven) base pairs within the target site, with only the guanine at the fifth position in the target site (where any basepair is tolerated by the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . enzyme) excluded from direct readout (figure and supplemental figure s ). individual side chain contacts to the target bases are formed both by the mtase domain (n , q , k , k , n , r , d ) and by the trd (n , r , y , d , k ). the flipped adenine base is bracketed by p-stacking with f and y and forms additional contacts with f and n . the space vacated by the flipped adenine a is occupied by n and k from the side of the major groove. within the dna-enzyme complex formed by a single drdv subunit as described above, the endonuclease domain is not in contact with the bound dna duplex (figure bc). instead, it contacts the top strand and the corresponding scissile phosphate in the dna duplex bound by the opposing subunit within a central dimeric enzyme assemblage (figure ab). the endonuclease domain from the opposing enzyme subunit is similarly domain swapped; both endonuclease domains are properly positioned to cleave the top strand of the dna duplex that is engaged by the opposing enzyme subunit (figure c). the enzyme dimer displays an extensive buried interface between the two helical connector domains, largely composed of a pair of buried, symmetrically equivalent electrostatic networks and surrounding hydrophobic and hydrogen-bonded contacts with neighboring residues (figure d). within each network, a cluster of three acidic residues from one subunit (d , e and d ) is engaged with a corresponding cluster of three basic residues form the opposing subunit (k , r and k ), thereby bringing together at least opposing charged residues. this electrostatic network is augmented by two patches of electrostatic p-stackings between r of one subunit and q andy of a second subunit and vice versa. in fact, y of the mtase domain is the one and only residue outside of the helical connecter domains that is involved in the interactions between the two subunits in the core dimer. in each endonuclease-dna interface (figure c) the scissile phosphate is engaged in contacts with a divalent metal ion (a calcium, which was present in the enzyme buffer to prevent cleavage) complexed by a pair of conserved acidic residues (d and e ). a neighboring lysine residue (k ) and nearby additional glutamic acid (e ) complete the nuclease active site. the conserved lysine of the canonical pd-exk endonuclease motif (k , mutation of which abolishes catalysis) is also positioned near the scissile phosphate where it could participate in catalysis upon adopting a different rotamer than that observed with calcium present in the structure. the dimeric assemblage of enzyme-dna complexes described above is further augmented by additional bound drdv subunits, forming dna-bound trimeric and tetrameric complexes (figure and figure ). (in the tetrameric particles, the fourth and final subunit displays partial, sub-stoichiometric occupancy). in those structures, the additional enzyme subunits are positioned on either side of the dimer described above, via an additional protein-protein interface between two endonuclease domains (figure a)). this interface is again composed primarily of a pair of symmetry-related clusters of opposing charged residues (figure b), each of which corresponds to k and d from one subunit forming a pair of buried electrostatic contacts with e and r from the opposing subunit. the additional endonuclease domain also contacts an extension from the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . trd domain of the enzyme bound to the dna strand being cleaved. the dimerization of the two endonuclease domains places a pair of active sites in alignment with the appropriate scissile phosphates of each strand in a dna duplex, which in turn allows the enzyme to generate a double strand-break (corresponding to a -base ’ product overhang) downstream of the bound target site figure a box and inset). as a result of the assembly and coordination of the cleavage complex described above, not less than three individual protein subunits and two bound dna targets are required in order to form all contacts necessary to cleave a single bound dna duplex (figure c) and four protein subunits are required in order to simultaneously cleave two dna duplexes. discussion and conclusions. the biochemical activities and corresponding structures presented in this study reinforce (and illustrate a structural basis for) the concept that an invading dna such as a phage genome, that presents multiple unmodified sites in a single construct, will be rapidly cleaved whereas the generation of individual unmodified sites in one daughter chromosome following replication would favor modification, as there is less chance to assemble multiple site-bound molecules into cleavage competent complexes. the inefficient cleavage of single site substrates, and activation by specific dna target sites in trans, indicates drdv must interact with multiple sites to achieve rapid and efficient dna cleavage. the structures of drdv described here (in several stages of assembly with multiple bound dna targets) offer a sequential view of such an enzyme before and during each stage of dna search, encounter, coordination and cleavage. when considered alongside previous crystallographic structures of two related type iil enzymes at earlier points in their action (bpusi, solved in the absence of a bound dna target (shen et al., ) and mmei, bound to a single copy of its dna target, in a monomeric complex (callahan et al., )), a rather complete picture of the functional cycle and mechanism for such bifunctional r-m enzymes seems to emerge. in those earlier crystal structures, the n-terminal endonuclease domain in unbound bpusi was found to be well-resolved and packed against the interface between its downstream mtase and trd domains, in a manner that would require its release in order to bind dna, effectively sequestering the endonuclease catalytic site to prevent any dna cutting. in contrast, the endonuclease domain in the dna-bound mmei enzyme was unobservable (and presumably displaying considerable motion and flexibility), suggesting release from the sequestered position and search for a partner upon initial recognition and engagement of its specific target site. like drdv, mmei requires multiple sites for cutting and is stimulated by in trans dna containing a recognition site. a simple model of type iil enzyme function in a cell would be one in which the apo enzyme is in an inactive (endonuclease-sequestered) state that scans dna. upon encounter of its specific recognition motif, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the enzyme engages in a tight, long-lived complex that releases the endonuclease domain, but the endonuclease domain is not in contact with the dna bound by the enzyme. target recognition and binding would then be followed by a kinetic competition between two outcomes: eventual methylation and enzyme release or encounter and capture of an additional target-bound enzyme subunit to form a dimer with exchange of dna helices to the partner's endonuclease domain. formation of the central dimer is again followed by a kinetic competition between two outcomes: eventual methylation and enzyme release if no additional dna-bound partners are encountered, or encounter and capture of an additional target-bound enzyme subunit or two to form a catalytically competent trimer or tetramer, leading to rapid cleavage of the two dnas bound by the central dimer subunits. the structural analyses presented here also demonstrate that relatively little conformational difference exists between the two core dna-bound drdv subunits present in the central dimer particles, and the two additional dna-bound enzyme subunits that bind to the dimeric assemblage through their endonuclease domains to form the cleavage competent complex. however, examination of differences between those structures does indicate that the formation of endonuclease dimers at each dna cleavage site (corresponding to the conversion from dna-bound dimers to larger trimeric and/or tetrameric complexes) is accompanied by observable deformation of the dna substrates as part of the cleavage mechanism within each bound dna duplex, and a hinged rigid body rotation of the endonuclease domain by approximately o in the catalytic partner subunits relative to that in the central dimer. this rotation highlights the importance of dynamic flexibility of the endonuclease domain relative to the mtase and trd portion of the protein. the mtase and dna recognition domains of drdv and those of the type isp enzymes llagi and llabiii (kulkarni et al., ) are highly similar, indicating evolution from a common origin, yet the way dna restrictive cleavage is achieved and controlled between these type iil and type isp rm systems is quite different. the type isp license their endonuclease for cutting through collision encounter between enzymes translocating on the dna in the opposite direction from inverted recognition sites. their endonuclease domains never actually encounter one another, but simply nick one strand of the dna multiples times on either side of the collision complex, eventually leading to double strand breaks when the nicks occur close together as the enzymes move against one another. in stark contrast, drdv remains bound to its recognition site and recruits additional dna-bound enzyme molecules, first to form a non-catalytic dimer using one set of protein contacts between their linker and methylase domains that positions the endonuclease of each subunit against the dna of the other, and then to form catalytic complexes through a different set of protein contacts, largely between endonuclease domains, to bring two endonuclease catalytic centers together for double strand cleavage at a fixed distance from the bound recognition site. this implies the endonuclease domains of type iil enzymes are under greater evolutionary pressure and corresponding sequence constraint as they form multiple protein-protein contacts as well as positioning the catalytic center for cutting. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . restriction-modification systems such as those that rely on recognition and cleavage of specific target sequences in foreign dna are complemented by additional phage restriction mechanisms and systems (such as the pgl (sumby and smith, ), brex (goldfarb et al., ), dnd (xu et al., ) and ssp (xiong et al., ) systems) that also utilize a site-specific protective activity (usually a methyltransferase) to again protect the bacterial genome from self-destruction. the exact manner in which the self-modifying protective activity is employed differs between systems (for example, the methyltransferase activity in the pgl and brex systems requires the presence of one or more additional protein factors in order to methylate host dna). regardless, the observations described here, which demonstrate the basis of at least one mechanism by which protective versus destructive activities in a restriction system can be biased towards self and foreign, respectively, may be reflected (with many possible variations on a theme) within a wide range of alternative forms of cellular defense. acknowledgements this work was supported by nih grant r gm to bls, by an amazon cloud credit to bws, by the fred hutchinson cancer research center, and by new england biolabs. a portion of this research was supported by nih grant u gm and performed at the pncc at ohsu and accessed through emsl (grid. . ), a doe office of science user facility sponsored by the office of biological and environmental research. we thank justin kollman and david veesler at the university of washington for advice and assistance, janette myer for krios data collection, jeff tucker and dan tenenbaum for assistance in aws ec setup, melody campbell for critical reading of the manuscript, and sue biggins and richard roberts for support, encouragement and advice. competing interests statement yl and rdm are employees of new england biolab, a manufacturer of reagents, enzymes and tools for molecular biology. the enzyme described in this study, and/or ones similar to it, are commercial products produced by neb. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references callahan, s.j., luyten, y.a., gupta, y.k., wilson, g.g., roberts, r.j., morgan, r.d., and aggarwal, a.k. ( ). structure of type iil restriction-modification enzyme mmei in complex with dna has implications for engineering new specificities. plos biology , e . chand, m.k., nirwan, n., diffin, f.m., van aelst, k., kulkarni, m., pernstich, c., szczelkun, m.d., and saikrishnan, k. ( ). translocation-coupled dna cleavage by the type isp restriction-modification enzymes. nature chemical biology , - . emsley, p., lohkamp, b., scott, w.g., and cowtan, k. ( ). features and development of coot. acta crystallogr d biol crystallogr , - . gao, y., cao, d., zhu, j., feng, h., luo, x., liu, s., yan, x.x., zhang, x., and gao, p. ( ). structural insights into assembly, operation and inhibition of a type i restriction-modification system. nature microbiology. goldfarb, t., sberro, h., weinstock, e., cohen, o., doron, s., charpak-amikam, y., afik, s., ofir, g., and sorek, r. ( ). brex is a novel phage resistance system widespread in microbial genomes. embo j , - . gupta, y.k., chan, s.h., xu, s.y., and aggarwal, a.k. ( ). structural basis of asymmetric dna methylation and atp-triggered long-range diffusion by ecop i. nat commun , . halford, s.e. ( ). restriction enzymes - the (billion dollar) consequences of studying why certain isolates of phage infect only certain strains of e. coli. biochemist , - . horton, j.r., mabuchi, m.y., cohen-karni, d., zhang, x., griggs, r.m., samaranayake, m., roberts, r.j., zheng, y., and cheng, x. ( ). structure and cleavage activity of the tetrameric mspji dna modification-dependent restriction endonuclease. nucleic acids research , - . kelley, l.a., mezulis, s., yates, c.m., wass, m.n., and sternberg, m.j. ( ). the phyre web portal for protein modeling, prediction and analysis. nat protoc , - . kennaway, c.k., obarska-kosinska, a., white, j.h., tuszynska, i., cooper, l.p., bujnicki, j.m., trinick, j., and dryden, d.t. ( ). the structure of m.ecoki type i dna methyltransferase with a dna mimic antirestriction protein. nucleic acids research , - . kennaway, c.k., taylor, j.e., song, c.f., potrzebowski, w., nicholson, w., white, j.h., swiderska, a., obarska-kosinska, a., callow, p., cooper, l.p., et al. ( ). structure and operation of the dna- translocating type i dna restriction enzymes. genes & development , - . koonin, e.v., and makarova, k.s. ( ). origins and evolution of crispr-cas systems. philosophical transactions of the royal society of london. series b, biological sciences , . kulkarni, m., nirwan, n., van aelst, k., szczelkun, m.d., and saikrishnan, k. ( ). structural insights into dna sequence recognition by type isp restriction-modification enzymes. nucleic acids research , - . loenen, w.a., dryden, d.t., raleigh, e.a., and wilson, g.g. ( a). type i restriction enzymes and their relatives. nucleic acids research , - . loenen, w.a., dryden, d.t., raleigh, e.a., wilson, g.g., and murray, n.e. ( b). highlights of the dna cutters: a short history of the restriction enzymes. nucleic acids research , - . loenen, w.a., and raleigh, e.a. ( ). the other face of restriction: modification-dependent enzymes. nucleic acids research , - . morgan, r.d., and luyten, y.a. ( ). rational engineering of type ii restriction endonuclease dna binding and cleavage specificity. nucleic acids research , - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pettersen, e.f., goddard, t.d., huang, c.c., couch, g.s., greenblatt, d.m., meng, e.c., and ferrin, t.e. ( ). ucsf chimera--a visualization system for exploratory research and analysis. j comput chem , - . pingoud, a., wilson, g.g., and wende, w. ( ). type ii restriction endonucleases--a historical perspective and more. nucleic acids research , - . punjani, a., and fleet, d.j. ( ). d variability analysis: directly resolving continuous flexibility and discrete heterogeneity from single particle cryo-em images. biorxiv https://doi.org/ . / . . . . punjani, a., rubinstein, j.l., fleet, d.j., and brubaker, m.a. ( ). cryosparc: algorithms for rapid unsupervised cryo-em structure determination. nat methods , - . rao, d.n., dryden, d.t., and bheemanaik, s. ( ). type iii restriction-modification enzymes: a historical perspective. nucleic acids research , - . roberts, r.j. ( ). how restriction enzymes became the workhorses of molecular biology. proc natl acad sci u s a , - . samuelson, j.c., zhu, z., and xu, s.y. ( ). the isolation of strand-specific nicking endonucleases from a randomized sapi expression library. nucleic acids research , - . shen, b.w., xu, d., chan, s.-h., zheng, y., zhu, z., xu, s.-y., and stoddard, b.l. ( ). characterization and crystal structure of the type iig restriction endonuclease rm.bpusi. nucleic acids research , - . Šišáková, e., van aelst, k., diffin, f.m., and szczelkun, m.d. ( ). the type isp restriction- modification enzymes llabiii and llagi use a translocation-collision mechanism to cleave non-specific dna distant from their recognition sites. nucleic acids research , - . suloway, c., pulokas, j., fellmann, d., cheng, a., guerra, f., quispe, j., stagg, s., potter, c.s., and carragher, b. ( ). automated molecular microscopy: the new leginon system. j struct biol , - . sumby, p., and smith, m.c. ( ). genetics of the phage growth limitation (pgl) system of streptomyces coelicolor a ( ). molecular microbiology , - . xiong, x., wu, g., wei, y., liu, l., zhang, y., su, r., jiang, x., li, m., gao, h., tian, x., et al. ( ). sspabcd-sspe is a phosphorothioation-sensing bacterial defence system with broad anti-phage activities. nature microbiology. xu, t., yao, f., zhou, x., deng, z., and you, d. ( ). a novel host-specific restriction system associated with dna backbone s-modification in salmonella. nucleic acids research , - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure legends. figure . in vitro biochemical analyses of drdv activities. see methods for details of all reaction conditions. panel a: methylation by drdv is much slower than endonuclease cleavage. left: time course of drdv incubation in buffer with sam (adomet) but without mg++ (to prevent cleavage). the dna substrate (pad bsabi plasmid containing drdv sites) was incubated with unit drdv for the indicated time, then immediately purified using a spin column. the purified dna was then challenged by cutting with drdv (now in the presence of mg++) to assess methylation status. some partial methylation is observed starting at minutes, but full methylation requires between and hrs. right: same time course in buffer containing mg++. cleavage is % complete within minutes and fully complete in hour. panel b: cleavage is activated by presence of dna target site added in trans. left: drdv cleavage of a puc plasmid substrate, (linearized with psti) that harbors a single drdv site: -fold serial dilution of drdv from to . units. the extra bands indicate star activity at near-cognate drdv sites in the presence of the highest amounts ( and units) of drdv. right: drdv cleavage of the same substrate, at the same enzyme concentrations, each in the presence of mm of a short dna hairpin duplex containing the drdv target site. cleavage goes to completion and displays greatly reduced off-target cutting. panel c: drdv cleaves both dna strands at multiple target sites in a coordinated manner. left: time course of drdv digestion of supercoiled pbr dna ( units/ug) for s, s, , , , , and min. supercoiled plasmid is converted directly to linear plasmid cut at one site, or to fragments representing cutting at two sites, with very little appearance of open circle (oc) dna nicked in one strand only. subsequently the dna is cut at all three sites. right: -fold serial dilution of drdv from to . units/ug pbr substrate. at limiting enzyme, the majority of the cut dna represents cutting at one site to linearize the plasmid. figure . cryoem analysis of drdv-dna complex. panel a: density maps of dimeric, trimeric and tetrameric dna-bound enzyme complexes. the fourth and final subunit in the tetrameric enzyme assemblage is displays sub-stoichiometric partial occupancy. see also supplementary movie . panel b: superposition of all three density maps. panel c: front and back views of the drdv tetramer density maps with individually colored enzyme subunits and bound dna duplexes. panel d: atomic model of the drdv tetrameric assembly. panel e: model of the central dna-bound enzyme dimer (subunits a and b) extracted from the tetrameric assemblage, overlayed with boxes corresponding to representative regions of density and respective models shown in panel f. panel f: close-up views of cryoem map corresponding to (i) the n- terminal endonuclease domain; (ii and iii) the central methyltransferase domain and (iv) the c-terminal target recognition domain (trd) as indicated by boxes in panel e. figure . conformation of an individual drdv subunit bound to a dna target. the model and maps shown are extracted from the tetrameric enzyme assemblage. panel a: dna construct used for cryoem analyses. the duplex consists of complementary basepairs and spans both the enzyme’s seven basepair target site (red bases and blue underlined site of adenine methylation) and its downstream cleavage sites on (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the top and bottom strands (blue bases flanking the scissile phosphates, and basepairs downstream from the final basepair of the target site). panel b: density map of a single dna bound drdv protomer with color coded domains: drdv is a single chain protein spanning residues, corresponding to an n-terminal endonuclease domain (light blue), a subsequent helical connector region (yellow) and central methyltransferase (‘mtase’) domain (purple) and a c-terminal target recognition domain (‘trd’) (pink). panel c: each enzyme subunit contains a bound s-adenosyl-methionine (‘sam’) cofactor (magenta) bound in its active site. the adenine base at position in the target is flipped into the mtase active site and is unmethylated. the base is contacted by three aromatic residues (y , f and f ) and a neighboring asparagine (n ) from the mtase domain. n and k occupy the space vacated by the filliped-out adenine. panels d and e: views of molecular model and corresponding electron density map in the enzyme- dna interface, with several residues that form additional sequence-specific contacts to the dna target site shown. several basic and polar residues from the methyltransferase (including k and k ) and the trd (including d and k ) contribute additional base-specific contacts in the target site. the adenine that is targeted for methylation (underlined ‘a’) is clearly flipped out of the dna duplex and positioned proximal to the bound sam cofactor; both moieties are clearly visible in the cryoem density maps. additional details of basepair-specific contacts are illustrated in supplemental figure s . figure . formation of a drdv dimer bound to two individual dna targets positions their endonuclease domains near a one strand of their partner’s bound dna duplex. the map shown is extracted from the larger tetrameric assemblage. panel a: two different views of the map. the map corresponding to one enzyme subunit is colored solid blue, while the second is colored to indicate the enzyme’s individual domains. the primary interface between individual protein subunits, and the interface between the dna duplex bound by subunit b and the endonuclease domain of subunit a, are indicated by boxes labeled ‘c’ and ‘d’ and correspond to further detail illustrated in panels c and d below. panel b: ribbon diagram of the drdv core dimer, again indicating the location of interfaces between the protein subunits (largely via their helical connector regions) and between the nuclease domain of subunit a with dna target from subunit b. as illustrated in the adjacent cartoon, the endonuclease domains are swapped between subunits, such that each enzyme subunit positions its endonuclease domain in contact with the dna duplex bound by the opposite enzyme subunit. panel c: contacts between the endonuclease active site of subunit a and the dna duplex bound by subunit b, illustrating the coordination of a bound calcium ion by residues of the active site and by the scissile phosphate on the corresponding strand of dna. panel d: ribbon model and density illustrating the interface between the enzyme subunits. the interface is largely composed of two buried, symmetry-related clusters of opposing charged residues ( acidic side chains (d , e and d ) from one subunit and basic residues (k *, r * and k *) from the opposite subunit, and vice-versa), augmented by similarly duplicated cation-pi interactions between r from one subunit and y * from the other, as well as an additional contact between r and q *. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . association of endonuclease domains in the enzyme trimer and tetramer assemblages results in three enzyme subunits being jointly involved in the cleavage of a single bound dna duplex. panel a: density map of the dna-bound enzyme tetramer, colored by subunit. the organization of the assemblage and corresponding coloring is also indicated in the cartoon schematic adjacent to the map. the endonuclease domains from subunits b and c (boxed and colored in shades of green) are jointly positioned directly adjacent to the scissile phosphates of the dna duplex bound by subunit a, with their active sites appropriately arranged to cleave the dna duplex downstream from the bound target site. the endonuclease domains from subunits a and d are similarly positioned to cleave the dna duplex bound by subunit b. panel b: orthogonal view of the packing and interactions between endonuclease domains shown in panel a, and corresponding density map overlayed with atomic model, showing buried, symmetry-related clusters of opposing charged residues (e and r from one subunit, versus r and d from the other, and vice- versa) interacting in a pairwise manner within a helical interface (located on the opposite side of the domains from their active site). panel c: the recognition, binding and cleavage of a single dna duplex involves contacts and interfaces formed between three separate dna-bound enzyme subunits. ______________________________________________________________________________________ supplemental figure s . enzyme purification and initial negative stain em microscopy. panel a. sds- page of drdv at different stages of purification. see methods for full details of enzyme production panel b. sec elution profiles of free and dna bound drdv, red and blue curve respectively (left) and elution profile of dna bound drdv overlayed with absorbance ratio at nm and nm, indicating formation and elution of dna-bound enzyme complex. panel c. negative stain electron microscopy of drdv apoenzyme and dna- bound complex. a: drdv in the absence of dna. b: negative-stained image of drdv in the presence of an equimolar amount of dna duplex (sequence provided in methods) containing the enzyme target recognition site (catggac) and basepairs downstream of the target. c: selected panels of the drdv dna complex. panel d: selected d particle classes., panel e: reconstructed low-resolution d model of negative-stained drdv-dna particles. supplemental figure s . flow chart for cryoem analyses using a glacios microscope operating at kv. data was collected a pixel size of . Å and processed using the package cryosparcv . for full details of data collection and processing approach, see methods. supplemental figure s . flow chart for cryoem analyses using a krios microscope operating at kv. computational processing , d reconstruction and refinement corresponds to data collected at pixel size of . Å. for full details of data collection and processing approach, see methods. supplemental figure s . em resolution and enzyme sequence versus structure. panel a. local resolution distribution of the density maps for the dimer, trimer and tetramer on the same scale. panel b. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . amino acid sequence and visualized secondary structure of a drdv subunit. panel c. pairwise secondary structure superposition of the two protomers of the core dimer (i), of the two protomers on the side (ii) and one each of the protomer in the center and on the side (iii)., showing the conservation of the folding of all domains and their relative disposition in all the particles. the only observable difference in the disposition is a hinged rigid body rotation of the nuclease domain of the protomer on the side with respected to that of the protomer in the core dimers. supplemental figure s . contacts between drdv and base pairs in the enzyme’s target site. supplemental move m . rotation and visualization of cryoem map and corresponding molecular model of the dna-bound drdv tetrameric assemblage. supplemental move m . morph from cryoem map of dna-bound drdv dimeric assemblage to dna- bound drdv trimeric assemblage. supplemental move m . morph from cryoem map of dna-bound drdv trimeric assemblage to dna- bound drdv tetrameric assemblage. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a b (-) (-) c time course [enzyme] [enzyme] [enzyme] (-) (-) gactcgctcatggacctgagcactc - ’ ctgagcgagtacctggactcgtg - ’ tt t t - trans target + trans target: figure (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a b c d c ba e r s p w r n dnatop dnabottom i. endonuclease n y e t ii. methyltransferase a y r l v f iii. methyltransferase b b b b b y v y l i i y i iv. trd k a d v d l t f a b d c a b c d d iii iv ii i figure (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a ’ - c a g c c c a t g g a c c c a g a a c c a c c c a c c - ’ ’ - g t c g g g t a c c t g g g t c t t g g t g g g t g g - ’ b k d sam cleavage endonuclease mtase trd helix connectorc a f f sam k f y n f k k r r y n k q n a g gta t c t a g c g c e a sam f r n r t a g g c t g a c + + + + + + + + + + + figure (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . c d c d c d b e d e k e c+ c+ a+ c+ ca ca++ e e d k k a+ c+ c+ c+ y k k ar a y b q b e b e a q a r b y ak b q b q a k ar a y b q b q b q a r b q a y ak b a d d c figure (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . c d b r b e b r b d c r c e c r c q b q c b b a y n f y k q y y k r k a t g cga t g c g c a t g c a t a t g c g c a t g c g c a t g c c g cc g c g c g cg c + + + + + + + + + g c + k s m g c + n q k r r k s s subunit a subunit b subunit c s a t c + t a r k g c + + n r d e e ca s d cad e s e r g k r q n y k r + a y z x a b c d d b r b q b q c r c r b e b d c r c e c figure (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . b c d e hepari n source q source s final a supplemental figure s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . homogeneous, non-uniform, & local refinements movies ! preprocessing! movies, template picker (same as ii.a. ) , inspect picks, particle extraction particles movies ! preprocessing ! movies template picker (six ns- d classes shown above with a diameter of Å), followed by inspect picks, particle extraction , particles ,. particles ( d classification, select d) x , particles a. ab-initio reconstruction, single model ii. viewing direction distributiongsfsc resolution . Å fragments , particles . % full trimer , particles . % partial trimer , particles . % dimers , particles . % b. ab-initio reconstruction, models partial trimer , . Å ii. preliminary cryoem screening and analysis of sample preparation: a. b. gsfsc resolution . Å gsfsc resolution . Å homogeneous, non-uniform & local refinements of individual class with respective class of particles viewing direction distributionviewing direction distribution o o o o supplemental figure s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . particles, ab-initio d reconstruction, three models homogenous, non-uniform, ctf & local refinements tetramer , particles . % ab-initio d reconstruction, three models import->motion correction->ctf estimate->manual curate blob picker->inspect pick->extraction-> ( d classification/ d selection) x movies micrographs , particles viewing direction distributiongsfsc resolution . Å trimer , particles . % larger aggregates , particles . % . % . % . % . Å . Å supplemental figure s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a b hinge i ii iii c supplemental figure s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . position : catgnac '- catgnac - ' ’- gtncatg - ' met gln arg position : catgnac '- catgnac - ' '- gtacnag - ' tyr lys position : catgnac leu asn lys '- catgnac - ' ’- gtncatg - ' position : catgnac '- catgnac - ' ’- gtncatg - ' asp lys position : catgnac arg asp '- catgnac - ' ’- gtncatg- ' ’ - c a g c c c a t g g a c c c a g a a c c a c c c a c c - ’ ’ - g t c g g g t a c c t g g g t c t t g g t g g g t g g - ’ + + + + + + + + + + + supplemental figure s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . shen_etal_drdv_ jan drdv_figures_final_december recognition of a tandem lesion by dna glycosylases explored combining molecular dynamics and machine learning recognition of a tandem lesion by dna glycosylases explored combining molecular dynamics and machine learning emmanuelle bignon ,*, natacha gillet , chen-hui chan , tao jiang , antonio monari , and elise dumont , univ. lyon, ens de lyon, cnrs umr , université claude bernard lyon , laboratoire de chimie, f , lyon, france université de lorraine and cnrs, lpct umr , nancy, france institut universitaire de france, rue descartes, paris *emmanuelle.bignon@univ-cotedazur.fr abstract the combination of several closely spaced dna lesions, which can be induced by a single radical hit, constitutes a hallmark in the dna damage landscape and radiation chemistry. the occurrence of such tandem base lesions give rise to a strong coupling with the double helix degrees of freedom and induce important structural deformations, in contrast to dna strands containing a single oxidized nucleobase. although such complex lesions are known to be refractory to repair by dna glycosylases, there is still a lack of structural evidence to rationalize these phenomena. in this contribution, we explore, by numerical modeling and molecular simulations, the behavior of the bacterial glycosylase responsible for base excision repair (mutm), specialized in excising oxidatively-damaged defects such as , -dihydro- -oxoguanine ( -oxog). the difference in lesion recognition between a simple damage and a tandem lesions featuring an additional abasic site is assessed at atomistic resolution owing to microsecond molecular dynamics simulation and machine learning postprocessing, allowing to extensively pinpoint crucial differences in the interaction patterns of the damaged bases. this work advocates for the use of such high throughput numerical simulations for exploring the complex combinatorial chemistry of tandem dna lesions repair and more generally multiple damaged sites of the utmost significance in radiation chemistry. keywords: mutm, dna repair glycosylase, tandem lesion, molecular dynamics simulations introduction the chemical stability of dna components is fundamental to maintain the genome stability, hence preventing unwanted mutations or cell death. indeed, the accumulation of dna lesions has been recognized as one of the principal causes of cancer development . although dna maximizes its stability through its helical structure, its constituting nucleic acids are constantly exposed to damaging agents, either endogenous or exogenous, that inevitably lead to the production of lesions. among the different sources of dna lesions, we may briefly remind oxidative agents, such as free radicals or reactive oxygen species (ros), uv light, and ionizing radiations. as a consequence, specific and highly efficient repair machineries exist that are able to recognize the presence of lesions in the genome and remove them to reinstate undamaged dna strands , . specific dna repair pathways may depend on the organisms, and are also related to the kind of lesions, for instance for localized oxidatively-induced damages the base excision repair (ber) pathway is preferred , , while for more extended and bulky lesions, such as base dimerization, the nucleotide excision repair (ner) mechanism is favored. yet this sophisticated repair mechanism has been reported to be strongly impaired when not only one but two adjacent dna lesions are located on the same strand, the so-called tandem lesions. the formation of tandem lesions can derive from a single radical hit, and their biological impact is now well established. while their formation mechanism has been delineated , the reasons underlying their resistance to repair are more elusive and should be analyzed taking firmly into account specific structural modification. the most common oxidative tandem lesions feature two adjacent oxidized nucleobases. in the following we will specifically consider -oxoguanine ( -oxog) and an abasic apurinic/apyrimidinic site (ap), as shown in figure -c. this arrangement is particularly relevant also because ap are also the most common outcome of ionizing radiations after excision of an entire nucleobase. . interestingly, ap sites also represent key intermediates of the ber machinery and result from the action of dna glycosylases before being further processed and removed by endonucleases. the presence of a tandem lesion, or more generally multiple damaged sites (mds), that are the hallmarks of radiation chemistry, induces strong coupling (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . between the lesions that in turn is translated into important structural deformations of the nucleic acid as compared to its ideal structure, i.e. either undamaged strand or sequences containing an isolated lesion. the unusual structural deformations induced by tandem lesions or mds may also well justify their globally lower repair rate as compared to other lesions – . to cope with their frequency, canonical dna lesions benefit from a most efficient repair. for instance, -oxog, which is well-known to mismatch with adenine and hence is potentially mutagenic , is repaired by formamidopyrimidine dna glycosylase, an enzyme that is referred to as fpg in eukaryotes, while its bacterial counterpart is called mutm. the latter recognizes the presence of -oxog in the genome and specifically binds at the damaged site . many studies have contributed to dissect the mode of action of mutm/fpg in presence of a single -oxog in particular concerning the recognition of the lesion – . fpg , has been shown to recognize -oxog among other oxidatively-induced lesions and to subsequently proceed to its extrusion initiating the base excision process , . the mechanisms of recognition and extrusion , , of -oxog have been scrutinized through a series of techniques, including molecular modeling and simulations, and are now relatively well characterized. recently, simmerling et al. , while recognizing the role of the damaged base flipping in favoring its recognition, have also pinpointed the existence of preliminary recognition steps correlating with the rapid sliding of fpg along the dna strand that is incompatible with a recognition mechanism based on the systematic flip of all the bases. in addition, the same authors have also identified that -oxog flips preferably through the major groove. the free energy required for the extrusion of -oxog in extrahelical position has also been estimated by la rosa and zacharias , also taking into account the contributions due to the dna global bending and twisting. a most important feature of mutm/fpg efficiency has been traced back to the crucial m , r , and f amino acids triad. indeed, it permits to disrupt -oxog interactions within the dna helix by intercalating above the -oxog position, thereby facilitating its extrusion towards the active site. besides, other several important mutm/fpg residues (k , h , y , k , and r ) are known to stabilize the dna helix by interacting with its backbone . ' ' ' ' dg dt da dg da dt dc dc dg dg da dc dg dc da dt dc dt da dg og dc dc dt dg dc b c oxog ap og r f m h k r y k a figure . (a) cartoon representation of the bacterial mutm in interaction with a -bp double stranded dna helix harboring -oxoguanine (og ) as the th nucleobase — pdb id code go . the magnified section highlights the position of the catalytic triad (m , r , and f in green) and the residues interacting with the dna backbone (in orange) around the damage. (b) sequence of the -bp oligonucleotide, showing the position of the -oxoguanine (in red). in simulations with tandem lesions, dg is mutated in silico into an abasic site (ap). (c) chemical structure of the -oxoguanine and the abasic site lesions. on the other hand, several studies have addressed the behavior of tandem-containing oligonucleotides, either from a biochemical and repair perspective or from a structural point of view , also relying on molecular modeling and simula- tions , , . globally, the different approaches agree in pointing out a strong effect of closely spaced lesions in modifying the structure and dynamics of the oligonucleotide. in addition, strong sequence effects, depending both on the relative position of the cluster lesions and on the nearby undamaged bases contribute to the complexity of the global landscape. the interaction of mds-containing oligonucleotides with repair enzymes and in particular both e. coli and human endonucleases , has been reported. the perturbations exerted by the secondary lesion on the protein/dna contact regions, and the consequent decrease in its repair efficiency, as observed for some particular tandem lesions have also been highlighted. however, no analysis of the structural behavior of fpg and mutm in presence of tandem dna lesions has been reported, despite the relevance that such lesions may assume in conditions of strong oxidative stress or ionizing radiations. in this work, we take advantage of the existing knowledge of -oxog recognition by mutm to investigate the structural and dynamic impact of the presence of a second, adjacent lesion, namely an ap site. relying on all-atom, explicit-solvent molecular dynamics , we simulate the structural and dynamical behavior of a mutm:dna complex. we consider both the situation in which only a single lesion ( -oxog) is present and compare it to the one in which the adjacent guanine base / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . dg , present in the x-ray structure (pdb id go ), has been in silico mutated to an ap site - see figure -b. we clearly show that the presence of the tandem lesion induces important structural deformations to the dna that significantly perturb the protein/nucleic acid interaction pattern, hence being susceptible to alter the -oxog extrusion. we extensively describe the changes in the -oxog lesion structural signature upon the presence of an adjacent ap site and the perturbation of the interaction network with mutm (fpg), which contribute to ultimately diminish the recognition efficiency. results we report the structural and dynamical properties of the bacterial mutm interacting with a -bp dna sequence harboring either a single -oxog at the position (as found in the crystal structure pdb id go ) or -oxog coupled with an ap site at position , along two replicas reaching µ s md simulation time each. the numbering of the nucleic acids used hereafter corresponds to the one in figure -b; the numbering of mutm residues refers to the crystallographic structure (pdb id go ). tandem lesions impact the interaction network around -oxog the interaction network as found in the mutm:dna crystal containing a single -oxog lesion is conserved stable along our md simulations. a most important structural feature in mutm is its intercalation triad, consisting of the m , r and f residues. those three amino acids are located around the -oxog in the minor groove, weakening the stabilizing interactions of the lesion within the double-helix to facilitate its extrusion. r interacts with the complementary dc , while m and f intercalate directly above -oxog and disrupt the stable π -stacking with the adjacent base-pair – see figure -a. these interactions are persistent along the entire md simulations of the singly-damaged system. f is involved in π -staking with dg during . % of the time series, with the distance between heavy atoms of their aromatic rings averaging at . ± . Å. the rest of the time, f stacks transiently with dc facing dg and their aromatic rings maintain a distance of . ± . Å along the simulations – see figure . m intercalates between og and dg , as its terminal methyl group remains at . ± . Å of the n atom involved in the n-glycosidic bond, and is ideally positioned to act on og desoxyribose moiety to drag it outwards . og dg dc r m f r a og dc ap r f r b dc m r r figure . cartoon representation of mutm interacting with the dna helix harboring a single -oxog lesion (og , a) or tandem lesions -oxog + ap (og and ap , b). h-bonds are depicted as dashes pink lines and the dna structure is rendered transparent for sake of clarity. upon multiple lesions, the interaction pattern around -oxog (og ) is perturbed. the intercalation triad m /r /f is shifted down by r which comes to interact between ap and the facing dc , preventing m ad f intercalation above -oxog. r , normally interacting with the dna backbone phosphates between positions and , is now involved in hydrogen bonding with og carbonyl. r side chain amino groups form h-bonds with the nitrogen and carbonyl of dc over . % of the simulation time, the distance between these two atom groups being of . ± . Å. several other amino acids have been identified to stabilize the mutm:dna complex by interacting with the negatively-charged phosphate groups of the backbone namely k , h , y , k , and r . these interactions are stable in our simulations and the highly conserved r forms strong h-bonds between og and dg phosphate groups - as shown in figure s . noteworthy, r is known to play a role in -oxog extrusion . the structural behavior of mutm:dna( -oxog) observed here corroborates the hypothesis of a highly dynamic system, whose functional flexibility is known to be central to ensure its biological role through the recognition and extrusion of -oxog , . / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the dg → ap mutation induces a clear perturbation of this well-characterized interaction network. a first consequence is the perturbation of the dynamics of the intercalation triad. the presence of the abasic site involves, in the first ns of simulation, a rapid reorientation of r situated just above the intercalation triad. r side chain turns towards the damaged site, and is found closer to og , at . ± . Å vs. . ± . Å observed in the singly -oxog-containing duplex. r does not interact directly with og but rather positions itself in the gap between the ap site and the facing dc , bridging the two residues through stable h-bonds as reported in figure -b. the distance between the r guanidinium nitrogens and the dc /ap h-bond acceptor atoms lies at . ± . Å and . ± . Å, respectively, in the tandem-damaged mutm:dna complex. comparatively, the dc -r distance is of . ± . Å in the singly damaged system – see figure -b. the reorientation of r reshapes the canonical interaction network of the intercalation triad, which is globally shifted downwards the duplex. f is pushed away from position and comes closer to the opposite strand, the dc -f distance drops to . ± . Å, although its strong cation-π interaction with r avoids direct stacking with dc . additionally, the distance between the r guanidinium extremity and the f aromatic ring is of . ± . Å vs. . ± . Å in the singly-damaged system, while the interaction of r with the estranged dc is destabilized. in presence of tandem lesions, r lies further from dc ( . ± . Å) than what is observed for the singly-damaged complex ( . ± . Å). the intercalation of m is prevented in presence of the tandem lesion since its terminal methyl group rotates away from og :n ( . ± . Å), while the interaction with m corresponds to a more rigid binding mode, with the formation of a h-bond between the sulfur atom and one hydrogen of og . the corresponding distance is reduced to . ± . Å vs. . ± . Å with the singly damaged (og ) duplex. b a figure . distribution of relevant distances involving the intercalation triad (a) and r (b) upon a single -oxog mutation (single) or ap + -oxog lesions (tandem). the presence of the ap site at position makes m and r move away from -oxog and the facing dc . f makes π -stacking with dc because the nucleobase at position is now absent. r comes closer to -oxog and intercalates in the gap left by the abasic site, with formation of very stable h-bonds bridging ap and the facing dc . it also interacts with f , preventing it to stack within the double-helix. interactions between the dna backbone and mutm tend to be more rigid upon tandem damages than in the singly damaged duplex. the h-bond between k and the phosphate at position is stronger as witnessed by the –nh +... p distance that is reduced to . ± . Å vs. . ± . Å for the singly-damaged system, as well as the interaction of h with da (nh - p distance of . ± . Å vs. . ± . Å with the isolated -oxog) and y h-bond with og (oh - p distance of . ± . Å vs . ± . Å). however, the interactions of r with the dna helix is strongly perturbed: in the singly-damaged system, r forms stable h-bonds with og and dg phosphates (cz - p distance of . ± . Å and . ± . Å, respectively) which are / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . disrupted upon the presence of the additional ap site (cz - p distance of . ± . Å and . ± . Å, respectively). the r position experiences important fluctuations in the tandem-damaged complex, and can form stable h-bond with og :o - see figure -b. the r :cz - og :o distance is below Å for % of the simulation time in the tandem-damaged complex, while in the singly-damaged complex such short distance amounts to % only - see figure s . this first local analysis suggests that the singly- vs. tandem-damaged -bp duplex present different interaction patterns, with non trivial changes in the binding mode and its dynamics. in order to probe more extensively the structural and dynamic consequences of dg → ap substitution, we have relied on a recently-proposed machine-learning protocol to identify other residues possibly implied in the recognition mechanism. systematic assessment of interacting residues through machine-learning protocol in order to probe the residues that exhibit important interactions with the dna duplex, a machine-learning protocol based on the multilayer perceptrons (mlp) classifier was set up. the latter allows to generate a "footprint" of the residues that are particularly involved in mutm:dna bonding – see figure . a score function, in the following referred to as ’importance’, is attributed to each residue: the higher the score, the higher the contribution to the mutm:dna complex stabilization. using a threshold of . of importance, and residues out of single out in the singly and tandem-damaged system, respectively. the three residues of the intercalation triad (i.e. m , r and f ) show a slightly higher contribution in the tandem- ( . , . , . ) than in the singly-damaged system ( . , . , . ). r and q , adjacent to m in the mutm sequence, also present high values. as highlighted by the visual inspection of our md trajectories, in the tandem-damaged system, r flips towards the lesion site to compensate for the nucleobase removal at position by bridging ap to the facing dc through strong h-bonds. the importance score for r is . with tandem lesions vs . with isolated -oxog, corroborating the significant role of this residue in mutm:dna binding upon the presence of ap , in line with the newly-formed and very stable h-bonds with the lesion site. q importance is higher than the threshold in both tandem- ( . ) and singly- ( . ) damaged systems. this residue interacts with r , contributing to the h-bonds network in the vicinity of the lesion. adjacent to f , g importance in the stabilization of the complex is also enhanced upon dg → ap mutation ( . in tandem vs. . with isolated -oxog). additional visual inspection of the md trajectories reveals that g forms a strong h-bond with r , helping in maintaining the latter intercalated between ap and the facing dc . loop p r m r f r n y r figure . importance of the contribution of residues to the mutm:dna complex bonding for the singly-damaged (blue) and the tandem-damaged (orange) systems. the threshold value above which the importance of the residue for the stabilization of the complex is considered as significant is . . some of the key-residues as well as the flexible loop region are pinpointed by the arrows. contributions of amino acids to the bonding are mostly higher upon -oxog/ap combination, suggesting a more rigid complex upon multiple damage sites than with an isolated -oxog. over the five key residues reported to anchor the phosphate dna backbone (k , h , y , k , and r ), only the closest to og are associated with importance scores above the threshold of . : y ( . and . for the singly- and tandem-damaged system), k ( . and . ) and r ( . and . ). the g and y residues also / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . contribute to the h-bonding with the dna backbone. n shows high importance values, . and . for singly- and tandem-damaged systems, this is due either to its interaction with the damaged site backbone or through indirect coupling with r , as previously described in the literature . i is involved in hydrophobic interactions with y that in turn interacts with og backbone. among other residues whose contribution is above the threshold, r forms h-bonds with either dc or dc backbone, p and m interact with dt :p, d and r maintain the ’-terminus backbone of the dna strand (in the dg and dt surroundings), l stabilizes the position of the key-residue r , while g and g form h-bonds with r or directly with the dna backbone, and k interacts with the dc phosphate. globally the mlp analysis clearly reveals that the protein residues comprised between the position and exhibit the highest values of importance. they correspond mostly to a large, flexible loop, comprising the residues – at the c-terminus that is prone to disorder, but also known to have an implication for dna recognition despite being spatially far from the double-helix . amino acids at the n-terminus also show significant contributions to the mutm:dna bonding. the proline located at the very end of the n-terminal region has an important catalytic role since it reacts with the c ’ atom of the deoxyribose sugar moeity of the -oxoguanine to form a schiff base, and hence it induces the cleavage of the n-glycosidic bond which constitutes the first step of the repair process. adjacent to p , the vicinal e is also known to play a role in mutm catalytic efficiency. interestingly, the contribution of these two residues to the mutm:dna stability decreases from . in the singly-damaged to . for the tandem-damaged complex, hence corroborating a subtle reduction of the excision efficiency. other residues of the n-terminus (l , p , e ) also show a drop in their contribution upon dg → ap mutation. the residues which single out in this mlp analysis match very well with the ones evidenced by previous works on mutm and fpg , , , , , . our machine-learning post-processing allows to disentangle a complex interaction pathway, which is already well-established for -oxog-containing dna but perturbed upon the presence of tandem lesions as revealed by the present simulations. it allows to generate an exhaustive map of residues showing importance for the protein-dna interactions, beyond the simple visual investigation based on the data from the literature. noteworthy, the nucleic acid importance score in the mutm:dna bonding is enhanced upon the presence of ap , denoting again a more constrained oligonucleotide - see figure s . mechanical and dynamic properties of the dna strand in order to assess the mechanical and dynamic properties of the dna strand, the md trajectories were post-processed with the curves+ program to evaluate the structural parameters of the double helix. the first signature of the b-helix is often the bend angle, which reaches typical values around ± ◦ upon interaction with mutm for the singly-damaged oligonucleotide. such extreme values for bending are typical , and necessary to facilitate the extrusion of the lesion towards the enzyme active site. the presence of the ap site at position is not sufficient to perturb the global bending of the -bp oligonucleotide ( ± ◦), but rather induces local deformations. dc -og parameter single tandem local bending (◦) . ± . . ± . tip (◦) . ± . . ± . inclination (◦) . ± . . ± . buckle (◦) - . ± . - . ± . propel (◦) . ± . . ± . opening (◦) . ± . . ± . shear (Å) - . ± . - . ± . stretch (Å) . ± . . ± . stagger (Å) . ± . . ± . table . averaged values of the dc -og base-pair structural parameters, for the single -oxog (single, left) and the tandem -oxog+ap (tandem, right). structural parameters of the dc -og basepair are particularly impacted, with values lower for the tandem- than for the singly-damaged system - see table . importantly, the backbone parameters ’bend’, ’tip’ and ’inclination’ are lower when the ap site is present at position , denoting a straighter portion of dna helix than what is normally found in the canonical single-damaged mutm:dna complex - see figure and figure s . the values monitored for these parameters are of . ± . ◦, . ± . ◦, and . ± . ◦, respectively in the singly-damaged system, vs . ± . ◦, . ± . ◦, and . ± . ◦ in the tandem- damaged complex. besides, several intra base-pair structural parameters are also found closer to the canonical b-dna for the dc -og base-pair. especially, the ’buckle’ and ’propeller’ drop from - . ± . ◦ to - . ± . ◦ and from . ± . ◦ / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . to . ± . ◦, respectively upon dg → ap mutation. other parameters (opening, shear, stretch, and stagger) show less significant deviations - see table and figure s . noteworthy, qi et al reported a change in puckering values upon oxidation of a guanine residue. -oxog would exhibit a c ’-exo puckering while a canonical dg ribose moiety would harbor a c ’-endo conformation. this would promote the recognition of -oxog by mutm. in our simulations, the frequency of the c ’-endo conformation of og is increased by the presence of tandem lesion compared to a single -oxog ( . % and . %, respectively). however, rather than the c ’-exo puckering ( . % and . % for tandem- and single-damaged), the c ’-exo is the main or second preponderant conformation ( . % and . %) - see figure s . concerning the inter base-pair parameters, dna structural values are comparable for single- and tandem-damaged systems and in agreement with previous works . as could be expected though, the absence of the nucleobase at position upon mutation to ap site influences the stability of the canonical stacking that is usually conserved in the singly-damaged complex. it is reflected in the distribution of the parameters values, which is much broader in the presence of ap at position - see figure s . this highlights the blurrier structural signature exhibited by the tandem-damaged dna helix, which is another criteria that might affect the interaction with the surrounding amino acids, hence the efficiency of the -oxog extrusion by mutm. inclination (°) propeller (°) tip (°) - - - - ta n d e m s in g le figure . distribution of three characteristic dna helix intra base-pair parameters for dc -og over µ s md simulation, for a single -oxoguanine (single, red, top) and both -oxoguanine and ap site (tandem, blue, bottom). the structural deformation with respect to canonical b-dna is globally shier for the tandem-damaged than the singly-damaged complex. discussion mutm, the bacterial analog of the human fpg, is responsible for the recognition and repair of the utmost common -oxog lesion. the fpg(mutm):dna interface has been investigated by nmr, x-ray and molecular dynamics simulations, probing the key residues that play a crucial role in the most specific recognition of -oxog, but also of other dna lesions , , – , , – . an intercalation triad (m , r , f ) has been characterized, and several other residues are known to be essential in mutm:dna interactions and -oxog extrusion, guiding the lesion towards the n-terminal proline responsible for the schiff base formation. intrahelical insertion of a single f wedge residue , , is marked and allows a slow scanning of the double helix by mutm and analog enzymes. among mutm key-residues, r , located in the zn-finger domain, is highly conserved and important for -oxog extrusion , . n also plays a key-role and its mutation leads to the perturbation of the r contacts , . besides, the c-terminal flexible loop is known to be essential for the -oxog recognition by folding over the lesion in a capping process , . while the recognition and repair of single -oxog by mutm are well documented, their perturbation upon the presence of tandem lesions is very poorly understood. however, it has been shown that ionizing radiations can lead to the formation of tandem lesions , rendering -oxog refractory to excision by glycosylases , . such multiple damaged sites are highly mutagenic and increase the risks of cancer development , . they can also be cytotoxic as their error-prone repair can result in the formation of deleterious double-strand breaks , . noteworthy, the high toxicity of the dna lesions induced by ionizing radiation is also exploited for the development of cancer (radio)-therapies . in this context, we investigated the structural impact of tandem lesions on the interactions between mutm and a -bp oligonucleotide harboring the -oxog lesion at / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . position . using molecular modeling and machine-learning analysis, we highlighted a structural re-organization of mutm canonical interaction network around -oxog upon the presence of an adjacent abasic site at position . the interaction network involving the intercalation triad and the damage is perturbed by the dg → ap mutation. the mutm:dna interactions are more pronounced, leading to a more rigid system, which could explain the difficulty of mutm to process such multiple damaged sites. while in the simulation of the singly-damaged system, the classical interaction patterns are observed, the presence of an additional ap site results in the rotation of r that provokes a shift of the intercalation triad. noteworthy, as r is poorly conserved in mutm sequences from different organisms , , one cannot rule out the possibility of a different reorganization around the intercalation triad. first observations of our md trajectories allowed to describe the re-shaping of the mutm:dna interaction patterns - see figure and . the structural analysis of the dna oligonucleotide also reveals changes in the local conformation of the lesion site (see figure and table ), which might jeopardize the efficiency of -oxog recognition by the enzyme. in order to go beyond the visual observation of mutm:dna interactions, we applied machine learning (ml) techniques to provide an extensive map of these contacts. ml methods have gained enormous amount of attention in recent years. their power in finding important information out of large amount of data has been exploited by the biochemistry community, many interesting applications have been showcased in the literature. recently, fleetwood et al. have demonstrated its capability in learning ensemble properties from molecular simulations and providing easily interpretable metrics describing important structural or chemical features. the machine-learning analysis of our trajectories is based on the demystifying package from fleetwood et al. . residues highlighted as providing a significant contribution to the mutm:dna bonding by the mlp analysis are in agreement with data from the literature. comparison of the residues importance in mutm:dna interactions upon single or tandem lesions allowed to pinpoint the changes in the interaction patterns, which concern the most important features of mutm - see figure . apparently in contradiction with common chemical sense, mlp analysis revealed that the dg → ap mutation leads to stronger, more stable interactions between the two macromolecules. the contribution of the residues involved in dna anchoring is almost systematically increased in the tandem-damaged system. nuleic acids also exhibit stronger interactions with mutm in the case of tandem lesion, which overall suggests that the presence of a second damage somehow results in a more rigid complex than when an isolated -oxog is present. however, the global rigidity of the tandem-damaged mutm:dna complex can actually be counterproductive for repair since it has been evidenced that flexibility of the dna strand is a key feature correlating with -oxog removal . this consideration is also further reinforced by the fact that conversely, the catalytic n-terminal residues are less involved in the mutm:dna complex stability in the case of tandem-damaged nucleotides. this is also the case for the - loop region which is known to play a key-role in -oxog extrusion. hence, the presence of the ap site alongside the -oxog lesion impacts the canonical structural behavior of these two important mutm regions, which might also contribute to the lower repair efficiency. our study provides an example of the predictive power of all-atom, md simulations coupled to machine learning analysis, applied to a very challenging test-case. indeed, the combination of oxidatively-generated dna lesions embrace a combinatorial chemistry, with contrasted structural, mechanical and dynamic properties. additionally, mutm/fpg are very flexible proteins , certainly difficult to properly sample. the efficiency of our protocols gives perspectives for its extension towards other tandem systems and the investigation of sequence effects , , . furthermore, the biological significance of rationalizing this complex scenario is also unquestionable. indeed, ionizing radiations can be satisfactorily exploited in cancer therapy, and the inhibition of repair enzyme by combined chemotherapy can prove a most valuable synergy in assuring the accumulation of lesions necessary to reach the apoptosis threshold. understanding of the molecular mechanisms underlying dna repair is thus crucial for also offering novel perspectives for cancer research. materials and methods all-atom molecular dynamics simulations all md simulations were performed with the amber and ambertools packages . the starting x-ray structure of bacillus stearothermophilus mutm was taken from the structure obtained by verdine and coworkers , pdb id code go . the crystallographic self-complementary ds-dna is a -bp sequence d(gtagatccgacg). (cgtccggatct) featuring -oxog as the th nucleobase (in bold). it should be noted that the β f-α loop – of mutm, absent from the crystal structure, was reconstructed using modeller. the zinc atom present in the zinc-finger motif of mutm was kept and described with parameters taken from the zinc amber force field (zaff) developed by merz and coworkers . potassium ions were added to neutralize the mutm:dna complex, which was embedded in a x x Å tip p water molecules bath. the amber ff sb was used throughout, including the bsc force field corrections for the dna duplex . the parameters for -oxog and ap site have been generated with a standard antechamber procedure embedded in amber , as described in previous references – and in agreement to the literature. four , steps minimization runs were carried out on the initial mutm:dna complex, imposing restraints on the amino and nucleic acids, that were gradually decreased / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . from to kcal/mol/Å along the four runs. the temperature was then raised from to k in a ps thermalization step, and afterwards kept constant using the langevin thermostat with a collision frequency γ ln of ps− . the system was subjected to a ns equilibration run in the npt ensemble. finally, two replica of µ s production run were performed to sample the conformational ensemble of the system. the particle mesh ewald method was used to treat electrostatic interactions, with a . Å cutoff. the structural descriptors of the dna helix were evaluated based on a post-processing analysis with curves+ and other distance and rmsd values were monitored using ambertools. multilayer perceptrons analysis the multilayer perceptrons (mlp) is a fully connected artificial neural network (ann) with one input layer, one output layer and at least one hidden layer. after tests, the architecture of the mlp was chosen to contain a single layer of neurons to provide good accuracy. the rectified linear unit function (relu) was used for the activation of neurons, and the adam algorithm was used for optimization. the inverse of the distances between the geometric centers of the residues were used as the input features for the multilayer perceptrons neural network, due to better overall performance over cartesian coordinates, according to fleetwood et al . these internal coordinates were computed for all residue pairs and all frames. each frame of the trajectories was labelled as either or according to whether the distance between the dna lesion(s) and the protein is lower (bounded) or higher (non-bounded) than Å. these sets of input features and labels were fed to the mlp classifier for training. upon completion of the training, layerwise relevance propagation (lrp) was performed to find out the important features of the dna/mutm interface. acknowlegements support from ens de lyon is gratefully acknowledged. this work was performed within the framework of the labex primes (anr- -labx- ) of université de lyon, within the program "investissements d’avenir" (anr- -idex- ) operated by the french national research agency (anr). references . basu, a. k. dna damage, mutagenesis and cancer. int. j. mol. sci. , , doi: . /ijms ( ). . cadet, j. & davies, k. j. a. oxidative dna damage & repair: an introduction. free. radic. biol. medicine , – , doi: . /j.freeradbiomed. . . ( ). . chatterjee, n. & walker, g. c. mechanisms of dna damage, repair, and mutagenesis. environ. mol. mutagen. ( ), – , doi: . /em. ( ). . david, s. s., o’shea, v. l. & kundu, s. base-excision repair of oxidative dna damage. nature , – , doi: . /nature ( ). . fortini, p. et al. -oxoguanine dna damage: at the crossroad of alternative repair pathways. mutat. res. mol. mech. mutagen. , – , doi: . /j.mrfmmm. . . ( ). . hong, i. s., carter, k. n., sato, k. & greenberg, m. m. characterization and mechanism of formation of tandem lesions in dna by a nucleobase peroxyl radical. j. am. chem. soc. , – , doi: . /ja ( ). . cadet, j. & wagner, j. r. dna base damage by reactive oxygen species, oxidizing agents, and uv radiation. cold spring harb. perspectives biol. , doi: . /cshperspect.a ( ). . bergeron, f., auvré, f., radicella, j. p. & ravanat, j.-l. ho• radicals induce an unexpected high proportion of tandem base lesions refractory to repair by dna glycosylases. proc. natl. acad. sci. , – , doi: . /pnas. ( ). . georgakilas, a. g., o’neill, p. & stewart, r. d. induction and repair of clustered dna lesions: what do we know so far? radiat. res. , – , doi: . /rr . ( ). . gattuso, h. et al. repair rate of clustered abasic dna lesions by human endonuclease: molecular bases of sequence specificity. the j. phys. chem. lett. , – , doi: . /acs.jpclett. b ( ). . bignon, e. et al. correlation of bistranded clustered abasic dna lesion processing with structural and dynamic dna helix distortion. nucleic acids res. , – , doi: . /nar/gkw ( ). . noguchi, m., urushibara, a., yokoya, a., o’neill, p. & shikazono, n. the mutagenic potential of -oxog/single strand break-containing clusters depends on their relative positions. mutat. res. mol. mech. mutagen. , – , doi: . /j.mrfmmm. . . ( ). / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint . /ijms . /j.freeradbiomed. . . . /em. . /nature . /j.mrfmmm. . . . /ja . /cshperspect.a . /pnas. . /rr . . /acs.jpclett. b . /nar/gkw . /j.mrfmmm. . . https://doi.org/ . / . . . . morland, i. et al. human dna glycosylases of the bacterial fpg/mutm superfamily: an alternative pathway for the repair of -oxoguanine and other oxidation products in dna. nucleic acids research , – , doi: . /nar/gkf ( ). . serre, l., pereira de jésus, k., boiteux, s., zelwer, c. & castaing, b. crystal structure of the lactococcus lactis formamidopyrimidine-dna glycosylase bound to an abasic site analogue-containing dna. the embo j. , – , doi: . /emboj/cdf ( ). . amara, p., serre, l., castaing, b. & thomas, a. insights into the dna repair process by the formamidopyrimidine-dna glycosylase investigated by molecular dynamics. protein sci. , – , doi: . /ps. ( ). . la rosa, g. & zacharias, m. global deformation facilitates flipping of damaged -oxo-guanine and guanine in dna. nucleic acids res. , – , doi: . /nar/gkw ( ). . qi, y., spong, m. c., nam, k., karplus, m. & verdine, g. l. entrapment and structure of an extrahelical guanine attempting to enter the active site of a bacterial dna glycosylase, mutm. j. biol. chem. , – , doi: . / jbc.m . ( ). . michaels, m. l., pham, l., cruz, c. & miller, j. h. mutm, a protein that prevents g c→t a transversions, is formamidopyrimidine-dna glycosylase. nucleic acids res. , – , doi: . /nar/ . . ( ). . fromme, j. c. & verdine, g. l. dna lesion recognition by the bacterial repair enzyme mutm. j. biol. chem. , – , doi: . /jbc.m ( ). . fromme, j. c. & verdine, g. l. structural insights into lesion recognition and repair by the bacterial -oxoguanine dna glycosylase mutm. nat struct mol biol , – , doi: . /nsb ( ). . qi, y. et al. encounter and extrusion of an intrahelical lesion by a dna repair enzyme. nature , – , doi: . /nature ( ). . li, h. et al. a dynamic checkpoint in oxidative lesion discrimination by formamidopyrimidine–dna glycosylase. nucleic acids res. , , doi: . /nar/gkv ( ). . hazel, r. d., tian, k. & de los santos, c. nmr solution structures of bistranded abasic site lesions in dna. biochemistry , – , doi: . /bi t ( ). . /bi t. . fujimoto, h. et al. molecular dynamics simulation of clustered dna damage sites containing -oxoguanine and abasic site. j. comput. chem. , – , doi: . /jcc. ( ). . cleri, f., landuzzi, f. & blossey, r. mechanical evolution of dna double-strand breaks in the nucleosome. plos comput. biol. , – , doi: . /journal.pcbi. ( ). . harrison, l., hatahet, z., purmal, a. a. & wallace, s. s. multiply damaged sites in dna: interactions with escherichia coli endonucleases iii and viii. nucleic acids res. , – , doi: . /nar/ . . ( ). . pérez, a., luque, f. j. & orozco, m. frontiers in molecular dynamics simulations of dna. accounts chem. res. , – , doi: . /ar ( ). . amara, p. & serre, l. functional flexibility of bacillus stearothermophilus formamidopyrimidine dna-glycosylase. dna repair , – , doi: . /j.{dna}rep. . . ( ). . landová, b. & Šilhán, j. conformational changes of dna repair glycosylase mutm triggered by dna binding. febs lett. , – , doi: . / - . ( ). . fleetwood, o., kasimova, m. a., westerlund, a. m. & delemotte, l. molecular insights from conformational ensembles via machine learning. biophys. j. , – , doi: . /j.bpj. . . ( ). . gilboa, r. et al. structure of formamidopyrimidine-dna glycosylase covalently complexed to dna. j. biol. chem. , – , doi: . /jbc.m ( ). . lavery, r., moakher, m., maddocks, j. h., petkeviciute, d. & zakrzewska, k. conformational analysis of nucleic acids revisited: curves+. nucleic acids res. , – , doi: . /nar/gkp ( ). . friedman, j. i. & stivers, j. t. detection of damaged dna bases by dna glycosylase enzymes. biochemistry , – , doi: . /bi a ( ). . sugahara, m. et al. crystal structure of a repair enzyme of oxidatively damaged dna, mutm (fpg), from an extreme thermophile, thermus thermophilus hb . the embo j. , – , doi: . /emboj/ . . ( ). / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint . /nar/gkf . /emboj/cdf . /ps. . /nar/gkw . /jbc.m . . /jbc.m . . /nar/ . . . /jbc.m . /nsb . /nature . /nar/gkv . /bi t . /bi t . /jcc. . /journal.pcbi. . /nar/ . . . /ar . /j.{dna}rep. . . . / - . . /j.bpj. . . . /jbc.m . /nar/gkp . /bi a . /emboj/ . . https://doi.org/ . / . . . . buchko, g. w., mcateer, k., wallace, s. s. & kennedy, m. a. solution-state nmr investigation of dna binding interactions in escherichia coli formamidopyrimidine-dna glycosylase (fpg): a dynamic description of the dna/protein interface. dna repair , – , doi: . /j.{dna}rep. . . ( ). . brooks, s. c., adhikary, s., rubinson, e. h. & eichman, b. f. recent advances in the structural mechanisms of {dna} glycosylases. biochimica et biophys. acta (bba) - proteins proteomics , – , doi: . /j.bbapap. . . ( ). . nelson, s. r., dunn, a. r., kathe, s. d., warshaw, d. m. & wallace, s. s. two glycosylase families diffusively scan dna using a wedge residue to probe for and identify oxidatively damaged bases. proc. natl. acad. sci. , e –e , doi: . /pnas. ( ). . watanabe, r., rahmanian, s. & nikjoo, h. spectrum of radiation-induced clustered non-dsb damage – a monte carlo track structure modeling and calculations. radiat. res. , – , doi: . /rr . ( ). . lomax, m. e., cunniffe, s. & o’neill, p. -oxog retards the activity of the ligase iii/xrcc complex during the repair of a single-strand break, when present within a clustered dna damage site. dna repair , – , doi: . /j.dnarep. . . ( ). . wood, m. l., dizdaroglu, m., gajewski, e. & essigmann, j. m. mechanistic studies of ionizing radiation and oxidative mutagenesis: genetic effects of a single -hydroxyguanine ( -hydro- -oxoguanine) residue inserted at a unique site in a viral genome. biochemistry , – , doi: . /bi a ( ). . moriya, m. single-stranded shuttle phagemid for mutagenesis studies in mammalian cells: -oxoguanine in dna induces targeted gc –> ta transversions in simian kidney cells. proc. natl. acad. sci. , – , doi: . /pnas. . . ( ). . vignard, j., mirey, g. & salles, b. ionizing-radiation induced dna double-strand breaks: a direct and indirect lighting up. radiother. oncol. , – , doi: . /j.radonc. . . ( ). . thompson, l. h. recognition, signaling, and repair of dna double-strand breaks produced by ionizing radiation in mammalian cells: the molecular choreography. mutat. res. mutat. res. , – , doi: . /j.mrrev. . . ( ). . baskar, r., lee, k. a., yeo, r. & yeoh, k.-w. cancer and radiation therapy: current advances and future directions. int. journal medical sciences , , doi: . /ijms. ( ). . sassa, a., beard, w. a., prasad, r. & wilson, s. h. dna sequence context effects on the glycosylase activity of human -oxoguanine dna glycosylase. j. biol. chem. , – , doi: . /jbc.m . ( ). . sassa, a. & odagiri, m. understanding the sequence and structural context effects in oxidative dna damage repair. dna repair , , doi: . /j.dnarep. . ( ). . case, d. et al. amber : san francisco ( ). . peters, m. b. et al. structural survey of zinc-containing proteins and development of the zinc amber force field (zaff). j. chem. theory comput. , – , doi: . /ct ( ). . maier, j. a. et al. ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb. j. chemical theory computation , – , doi: . /acs.jctc. b ( ). . ivani, i. et al. parmbsc : a refined force field for dna simulations. nat. methods , – , doi: . /nmeth. ( ). . bignon, e., dršata, t., morell, c., lankaš, f. & dumont, e. interstrand cross-linking implies contrasting structural consequences for dna: insights from molecular dynamics. nucleic acids research , – , doi: . /nar/ gkw ( ). . bignon, e., claerbout, v. e. p., jiang, t. & dumont, e. nucleosomal embedding reshapes the dynamics of abasic sites. sci. reports , , doi: . /s - - -y ( ). . dumont, e. et al. singlet oxygen attack on guanine: reactivity and structural signature within the b-dna helix. chem. eur. j. , – , doi: . /chem. ( ). . glorot, x., bordes, a. & bengio, y. deep sparse rectifier neural networks. vol. of proceedings of machine learning research, – (jmlr workshop and conference proceedings, fort lauderdale, fl, usa, ). / (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint . /j.{dna}rep. . . . /j.bbapap. . . . /j.bbapap. . . . /pnas. . /rr . . /j.dnarep. . . . /bi a . /pnas. . . . /j.radonc. . . . /j.mrrev. . . . /ijms. . /jbc.m . . /j.dnarep. . . /ct . /acs.jctc. b . /nmeth. . /nar/gkw . /nar/gkw . /s - - -y . /chem. https://doi.org/ . / . . . references structural basis of kai divergence in legume angelica m. guercio , françois-didier boyer , catherine rameau , alexandre de saint germain †, nitzan shabek † department of plant biology, university of california – davis, davis, ca université paris-saclay, cnrs, institut de chimie des substances naturelles, upr , , gif-sur-yvette, france institut jean-pierre bourgin, inrae, agroparistech, université paris-saclay, , versailles, france †correspondence should be addressed to: nshabek@ucdavis.edu , alexandre.de-saint- germain@inrae.fr (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract the α/β hydrolase karrikin insensitive- (kai ) mediates the perception of smoke- derived butenolides (karrikins) and an elusive endogenous hormone (kai -ligand, kl) found in all land plants. it has been suggested that kai gene duplication and sub-functionalization events play an adaptative role for diverse environments by altering the receptor responsiveness to specific kls. these diversification occurrences are exemplified by the variable number of functional kai receptors among different plant species. legumes represent one of the largest families of flowering plants and contain many essential agronomic crops. along the legume lineage the kai gene underwent a duplication event resulting in kai a and kai b. here we show that the model legume, pisum sativum (ps), expresses three distinct kai homologues, two of which, kai a and kai b have uniquely sub-functionalized. we characterize biochemically the distinct ligand sensitivities between these divergent receptors and report the first crystal structure of pskai in apo and butenolide-bound states. our study provides a comprehensive examination of the specialized ligand binding ability of legume kai a and kai b and sheds light on the perception and enzymatic mechanism of the kai -butenolide complex. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction karrikins (kars) are a family of butenolide small molecules produced from the combustion of vegetation and are a bio-active component of smoke – . these molecules are capable of inducing germination of numerous species of plants, even those not associated with fire or fire-prone environments such as arabidopsis , – . through studies in arabidopsis, kar sensitivity was shown to be dependent on three key proteins: a kar receptor, an α/β hydrolase karrikin insensitive (kai ), an f-box more axilliary growth (max ) component of the skp -cullin-f-box (scf) e ubiquitin ligase, and the proposed target of ubiquitination and degradation, the transcriptional corepressor smax /smxl , – . an increasing number of studies have shown that kai and kar signaling components are involved in the regulation of many plant developmental processes including seedling development, leaf shape, cuticle formation, and root development, as well as play roles in am fungi symbiosis and abiotic stress response – , – . the striking similarities between kar and strigolactone (sl) signaling pathways have been the focus of an increasing number of studies. both sls and kars share a similar butenolide ring structure but instead of the kar pyran moiety, the butenolide is connected via an enol ether bridge to either a tricyclic lactone (abc rings) in canonical sls, or to a structural variety in non- canonical sls , . the receptor for sl, dwarf (d ) shares a similar α/β hydrolase fold as kai and a parallel signaling cascade requiring the function of the max ubiquitin ligase and downregulation of smxls, corepressors which also share some structural elements with smax /smxl , , , . unlike kars, sls are plant hormones that act endogenously, but were also found to be exuded by plant roots. sls regulate diverse physiological responses such promoting hyphal branching of arbuscular mycorrhizal (am) fungi to enhance the efficiency of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . am symbiosis, stimulating germination of root parasitic plant species, repressing shoot branching, affecting lateral root formation, primary root growth, root hair elongation, secondary growth in the stem, leaf senescence, and adventitious root formation – . notably, kai family receptors have undergone numerous duplication events within various land plant lineages. d was found to be an ancient duplication in the kai receptor in seed plant lineage followed by sub-functionalization of the receptor, making it uniquely implied in sl signaling – . the age- old question in receptor diversity has been the evolutionary purpose and functional significance of kai duplication events. it has been shown that d and kai are not able to complement each other functions in planta – . to this end, within the ligand binding site of kai receptors the substitution of a few amino acids can alter ligand specificity between kai duplicated copies , . while the role of the d receptor in sl signaling is well established, kai receptors and kar signaling are less understood. furthermore, given the fact that kai is ancestral to d and that kar signaling controls diverse developmental processes including those unrelated to fire, it has been suggested that kai s are able to perceive an endogenous ligand(s), of which is currently unknown and tentatively named kai -ligand (kl) , , , . thus far, several crystal structures of kai /d receptors have been reported and have led to a greater understanding of receptor-ligand perception and the hydrolytic activity of the receptor towards certain ligands , , , , , , – . the divergence between duplications of kai receptors to confer altered ligand specificity has been partially addressed at the physiological and biochemical level for only few plant species, and a structural examination has been limited – , . legumes represent one of the largest families of flowering plants and contain many essential crops. beyond their agronomic value, most legume species are unique among plants because of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . their ability to fix nitrogen by utilizing symbiosis with rhizobia, in addition to am fungi symbiosis. because of the potential functional diversification and specialization of kai -ligand, we characterized and examined the kai receptor mechanism in legume, using pisum sativum (ps) as a model. in this study, we examined the implications of the pre-legume kai duplication event that resulted in legume kai a and kai b clades . we found that pisum sativum expresses three distinct kai homologues, two of which, kai a and kai b have uniquely sub-functionalized. we characterize biochemically the distinct ligand sensitivities between these divergent receptors and further report the first crystal structure of pskai b in apo and a unique butenolide-bound state at high resolution ( . Å and . Å, respectively). altogether our findings provide a comprehensive examination of the specialized ligand binding ability of legume kai a and kai b and sheds light on the perception and enzymatic mechanism of kai receptors. results genetic identification and characterization of the legume pisum sativum kai genes to characterize the karrikin pathway sensing mechanisms in legume we examined the evolutionary context of representative legume kai s. we focused on the pisum sativum genome that encodes distinct kai gene copies and represents the diversity of legume kai duplication events (figure a and figure s ). notably, the legume lineage has undergone an independent duplication event resulting in distinct kai a and kai b protein receptors. we identified three kai homologs in the pea genome that clearly group within the core kai clade by phylogenetic analysis. one (psat g ) renamed pskai b, grouped in the same subclade as the legume kai bs (including lotus japonicus, lj, kai b ) and two (psat g , termed (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pskai a and psat g ) in the same subclade as the legume kai as (including ljkai a ) (figure a and figure s ). psat g was very likely a pseudogene as the putative encoded protein is lacking amino acids (aa) in the middle of the protein in comparison to pskai a and pskai b. by cloning pskai a coding sequence (cds) we identified transcripts for this gene, corresponding to splicing forms (figure b). the transcript pskai a. comes from intron splicing and produces a protein of aa. thus, this protein shows a c-terminal extension of aa similar to ljkai a (figure s ), missing in other kai proteins. the pskai a. transcript arises from the intron retention, which shows a premature stop codon nucleotides after the end of the first exon. this leads to a aa protein showing a similar size to other kai proteins described (figure b and figure s ). from this analysis, it is clear that the kai clade has undergone an independent duplication event in the legume lineage resulting in these kai a and kai b forms (figure s a-b). to examine potential functional divergence between the pskai a and pskai b forms, we first analyzed the aa sequences and identified notable alterations in key residues, of which numerous are likely to be functional changes as indicated in later analyses (figure s ). to further characterize divergence of these genes we studied the expression patterns of the two pskai forms in various tissues of the pisum plant (figure c-d). interestingly, the expression of pskai s revealed a ten-fold higher expression of pskai a in comparison to pskai b and distinct patterns between the two forms in the roots, suggesting sub-functionalization between pskai a and pskai b. pskai genes can rescue inhibition of hypocotyl elongation of kai - arabidopsis mutant to test the function of pskai proteins in planta a cross-species complementation was performed by transforming the arabidopsis kai - mutant with the splicing forms of pskai a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (pskai a. , pskai a. , and pskai b, figure e). the proteins were expressed as fusion proteins with mcitrine or ha epitope driven by the native atkai promoter (patkai ). the widely described hypocotyl elongation assay , was performed under low light conditions, which causes an elongated hypocotyl phenotype of the kai - mutant when compared to ler. all constructs completely restored the phenotype of the kai - mutant to wt phenotype, except the patkai ::pskai a. - xha construct which restored partially the phenotype of the kai - mutant (figure e). because in arabidopsis the stereoisomer of the synthetic strigolactone (−)- gr may act as kl mimic compound by triggering developmental responses via atkai , , , we investigated hypocotyl elongation through the pskai proteins by quantifying hypocotyl length after (−)-gr treatment. only the lines expressing atkai control protein were able to respond to the treatment whereas all the complemented lines with pskai s did not significantly respond to (−)-gr (figure e). these results suggest that pskai proteins are the functional orthologues of the arabidopsis kai , however the differences in the ligand sensitivity between all expressed kai s were more elusive compared to the recently reported study in lotus and as suggested by our subsequent biochemical results. biochemical data reveal altered ligand specificity and activity between pskai s to investigate the functional specificity between pisum kai receptors, we have purified pskai recombinant proteins and investigated various ligand-interaction and ligand-enzymatic activities of the receptors (figures - and figures s -s ). we first examined kai a and kai b ligand interactions via the thermal shift assay (dsf) with various kai /d family ligands including (+) and (−)-gr enantiomers (also known as gr ds and gr ent- ds, respectively ) and (+)- and (−)- ’-epi-gr (also known as gr ent- do and gr do, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . respectively) (figure a-i). dsf analyses revealed pskai b has an increased change in stability in the presence of (−)-gr compared to pskai a which has little to no alteration (figure c-d). thus, pskai b protein differs from its ortholog from lotus, which is not destabilized by (−)-gr and suggests different ligand specificity among legumes. the other ligands and enantiomers induce no detectable shift in stability for either pskai proteins. in addition, an extensive interaction screen using intrinsic fluorescence further confirmed that only the (−)-gr stereoisomer interacts with pskai proteins (figure j and figure s ). the calculated kd revealed that pskai b has a better affinity for (−)-gr (kd = . ± . μm) than pskai a ( . ± . μm) as also indicated by the dsf assay. to further examine the catalytic activity of kai enzymes, an enzymatic assay was performed by quantifying the hydrolytic activity of pskai towards distinct ligands. to that end, kai proteins were incubated with (+)-gr , (−)-gr , (+)- ’-epi-gr and (−)- ’-epi-gr in presence of -indanol as an internal standard followed by ultraperformance liquid chromatography (uhplc)/uv dad analysis (figure ). the activity pskai a and pskai b was measured in comparison to atd , atkai , and rms . these results show that pskai a could only cleave (−)-gr , however pskai b is able to cleave (+)-gr , (−)-gr and (−)- ’-epi-gr stereoisomers. unlike rms , atd and atkai have no detectable cleavage for (+)- ’-epi-gr , strongly indicating that pskai s have different stereoselectivity. to further investigate the cleavage kinetics activity of pskai proteins, we performed an enzymatic assay with the pro-fluorescent probes that were previously designed for detecting sl perception mechanism . here, (±)-gc probe bearing one methyl group on the d-ring was used to measure hydrolysis activity by pskai s, rms , atd , and atkai enzymes (figure s a-b). as expected, pskai a showed no activity, similar to atkai , as previously reported . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . surprisingly, pskai b is able to cleave (±)-gc probe in a similar manner as atd and rms . it has been previously demonstrated that probes without a methyl group, such as dylg, can serve as the hydrolysis substrate for atkai . to that end, we used the (±)-gc probe bearing no methyl on d-ring, and notably, pskai b was able to hydrolyze the probe, whereas pskai a shows little to no activity (figure s c-d). furthermore, pskai a and atkai exhibit biphasic time course of fluorescence, consisting of an initial phase, followed by a plateau phase. by comparing the kinetics profiles, we noticed that with pskai b, rms and atd proteins, the plateau is higher ( µm versus . µ m of difmu), even if it takes pskai b longer to reach this plateau (figure s c-d). taken together with the comparative kinetic analysis, pskai b hydrolysis activity is more similar to sl receptors and further highlights the distinct function compared to pskai a. structural insights into legume kai s divergence to elucidate the differential ligand selectivity between kai a and kai b, we first determined the legume crystal structure of pisum sativum kai b at . Å resolution (figure and table ). the pskai b structure shares the canonical α/β hydrolase fold and is comprised of base and lid domains (figure a). the core domain contains seven-stranded mixed β-sheets (β –β ), five α- helices (αa, αb, αc, αe and αf) and five helices (ŋ , ŋ , ŋ , ŋ , and ŋ ). the helical lid domain (residues – , figure s ) is positioned between strands β and β and forms two parallel layers of v-shaped helices (αd - ) that create a deep pocket area adjoining the conserved catalytic ser-his-asp triad site (figure a and figure s ). despite the sequence variation ( % similarity between pskai b and atkai , figure s ), we did not observe major structural rearrangements between pskai b and the previously determined arabidopsis kai (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . structure as shown by an root mean squared deviation (rmsd) of . Å for superposition of backbone atoms (figure b). nonetheless, further structural comparative analyses have identified two unique residues alterations in positions and within the lid domain. these changes appear to marginally alter the backbone atoms and distinguish legume kai s family from other kai s species (figure s and figure b). the asparagine residue in position is more variable within legume kai s, and alanine or serine in position has diverged from bulky polar residues compared to other plant kai s. these amino acids alterations are likely to play role in downstream events rather than directly modulate distinct ligand perception. to further determine the differential ligand specificity between pskai a and pskai b, we utilized the pskai b crystal structure reported here to generate a d model for pskai a. as expected, pskai a structure exhibits a similar backbone atom arrangement (rmsd of . Å) that parallels the pskai b structure (figure a). nonetheless, we identified eight significant divergent amino acids between the two structures including residues involved in forming the ligand binding pocket as well as solvent-exposed surfaces (figure b-d and figure s a-b). because these variants are evolutionarily conserved across legume, the analysis of the underlined residues not only distinguishes between kai a and kai b in pisum but can be extrapolated to all legume kai a/b diverged proteins. structural comparative analysis within the ligand- binding pocket shows divergent solvent accessibility between pskai a and pskai b (figure b). pskai b exhibits a structural arrangement that results in a larger volume of the hydrophobic pocket ( . a ) yet with a smaller entrance circumference ( . Å) than pskai a ( . a and . Å, respectively, figure b). further in silico docking experiments of (−)-gr with pskai b results in a successful docking of the ligand that is totally buried in the pocket and positioned in a pre-hydrolysis orientation nearby the catalytic triad. in contrast, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . docking experiments of (−)-gr with pskai a results in more restricted interaction where the ligand is partially outside the pocket (figure s c). notably, there are five key residues that are found to directly alter the pocket morphology (figure c and figure s a-b). among these residues, l /s /m in pskai a and the corresponding residues, m /l /l in pskai b are of particular interest because of their functional implications in the pocket volume and solvent accessibility (figure d). residue is positioned at the entrance of the ligand- binding pocket in helix αd , thus the substitution of leucine (l in kai a) to methionine (m in kai b) results in modifying the circumference of pskai b pocket entrance (figure b-d). while both l and m represent aliphatic non-polar residues, the relative low hydrophobicity of methionine as well as its higher plasticity are likely to play major role in modifying the ligand pocket. the conserved legume divergence in residue (s in pskai a and l in pskai b, figure d) is positioned in helix αd and represents a major structural arrangement at the back of the ligand envelope. because leucine has moderate flexibility compared to serine and much higher local hydrophobicity, this variation largely attributes to the changes in the pocket volume as well as fine-tunes available ligand orientations. further sequence and structural analysis of the variant in position (m in kai a and l in pskai b) placed it in the center of the asp loop (d-loop, region between β and αe, figure c-d). in d , the d-loop has been reported to affect sl perception and cleavage as well as impact protein-protein interactions in sl signaling , pskai b forms a complex with the d-oh of (−)-gr to further examine the molecular interaction of pskai b with the enantiomeric gr , we co- crystallized and solved the structure of pskai b-(−)-gr at . Å resolution (figure a and (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table ). electron density map analysis of the ligand-binding pocket revealed the existence of a unique ring-shaped occupancy that is contiguously linked to the catalytic serine (s ) (figure a-b). the structural comparison of the backbone atoms between apo-pskai b and pskai b- (−)-gr did not reveal significant differences (figure s a) and is in agreement with previously reported apo and ligand bound d /kai crystal structures , , , , . this striking similarity suggests that a major conformational change, if indeed occurs as suggested for d , may happen after the nucleophilic attack of the catalytic serine and the (−)-gr cleavage which is likely to be highly unstable state for crystal lattice formation. further analysis suggests that -hydroxy- -methylbutenolide (d-oh ring), resulting from the (−)-gr cleavage, is trapped in the catalytic site (figure s b-d). the lack of a defined electron density fitting with the tricyclic lactone (abc ring) may exclude the presence of the intact gr molecule. other compounds present in the crystallization condition were tested for their ability to occupy the ser -contiguous density, and d-oh group of (−)-gr demonstrated the highest correlation coefficient calculated score and the best fit in the pskai b co-crystal structure (figure s c). additional tests of d-oh binding including in silico docking simulations and analyses revealed a high affinity for d-oh in a specific orientation and in agreement with the structure presented here (figure s d). the most probable orientation of the d-oh positions the methyl group (c ’) together with the hydroxyl group of d-oh towards the very bottom/back of the pocket near the catalytic serine, where the o ” atom is coordinated by both n atoms of f and v (figure b-c). the hemiacetal group (c ’) of d-oh is oriented towards the access groove of the pocket with angles (between carbon and oxygen atoms) supporting the captured d- oh in an orientation in which cleavage of the intact (−)-gr may have taken place. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the c ’ of d-oh appears to form a covalent bond with oγ of s (dark gray line in figure c) and generates a tetrahedral carbon atom. the overall positioning of this molecule is strictly coordinated by f , h , g , and i residues. remarkably, the electron density around the s does not display an open d-oh group ( , , ,-trihydroxy- -methyl- -butenal as previously described for osd ) that could directly result from the nucleophilic attack event, but rather correspond to a cyclized d-oh ring linked to the s . this d-oh ring is likely to be formed by water addition to the carbonyl group at c ’ that is generated after cleavage of the enol function and cyclization to re-form the butenolide (figure d). the formation of this adduct could also serve as an intermediate before the transfer to the histidine residue. taken together, our crystal structure highlights a potential new intermediate in the ligand cleavage mechanism by kai proteins. discussion the emerging characterization of karrikin/kl signaling in non-fire ecology plant receptors has been of great interest in the plant signaling field. while there are many missing pieces in the karrikin signaling puzzle, it is clear that kai serves as the key sensor in this pathway. furthermore, the coevolution between receptors and ligands in diverse contexts throughout plant evolution is of great interest in many biological fields. the limited natural occurrence of karrikin molecules and the evolutionary conservation of kai receptors throughout land plants suggest that the function of kai s are preserved to regulate plant development and response to stresses by perceiving an endogenous ligand(s) (kl). here, we identified and characterized the first kai receptors in pea (p. sativum) that serve as representatives of the independent duplication event and subsequent sub-functionalization in legumes. the identification of both pskai a and (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . pskai b genes corroborates the recent finding that the kai gene duplication event occurred in papilionoidaea before the diversification of legumes . interestingly, similarities in expression patterns are found between pea and lotus with global higher expression of the a clade in comparison to the b clade and specific expression in roots of the b clade in comparison to the a clade. further studies with pskai a/b mutants i.e. for the establishment of the symbioses in pisum roots could explain this differential expression in roots, as no clear root phenotype has been observed in lotus. the occurrences of molecular coevolution of ligands and their specialized receptors have been previously demonstrated for phytohormones such as sl , aba , ga , and more recently, karrikins , . even though the exact identity of kl ligands remains to be revealed, it is likely that the ligands share a common chemical composition to sls. it has been shown that the synthetic sl analogue, rac-gr , can function by binding kai in arabidopsis , , . in this work we carried out a comprehensive biochemical interrogation and found that pskai b can form stronger interactions with the enantiomeric gr , (−)-gr , compared to pskai a. moreover, we found that while both kai s are active hydrolases, they have distinct binding affinity and stereoselectivity towards gr stereoisomers. these findings indicate yet again, that sub-functionalization of kai s via substitutions in only few amino acids can greatly alter ligand affinity, binding, enzymatic activity, and probably signaling with downstream partners , . kai /d crystal structures have greatly impacted our understanding of these receptor ligand-binding pockets and their ability to not only accommodate, but also hydrolyze certain ligands , , , , , , – . the first crystal structure of legume pskai b together with the pskai a homology model reported here, further substantiates the structural basis of this differential ligand selectivity. we identified conserved key amino acid changes that alter the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . shape of the pocket and confer altered ligand specificities. these novel atomic structures of kai enabled us to analyze the distinction between key residues l /s /m in pskai a and the corresponding residues m /l /l in pskai b. these findings further support recent in planta and biochemical studies that demonstrate that residues and are required for differential ligand specificity between lotus kai a and kai b . furthermore, residue was also identified in the parasitic plant striga hermonthica as being involved in forming differential specificity pockets between the highly variable and functionally distinct shkai s, referred to as htls , . while the changes in positions and directly reshape the pocket morphology, the variant in position is located in the center of the d-loop . the d-loop contains the aspartic acid of the catalytic triad (d ) and has been suggested to play an important role in sl perception and cleavage by d as well as downstream protein-protein interactions , . therefore, the conserved substitution of kai a and kai b in m to l respectively across legumes not only contributes to ligand selectivity and hydrolysis, but may also affect downstream interaction(s). based on the analogy with the d -max perception mechanism, the kai receptor is likely to adopt different conformational states upon ligand binding and cleavage. as such, the identification of unique residue variations in the lid (between kai a and kai b, respectively in positions and ) reported here, infer a sub-functionalization in the receptor regions that are likely to be involved in max and/or smax and/or smxl downstream interactions. therefore, it remains to be further elucidated whether these kai a/b distinctive residues play a role in fine tuning the formation of the protein complex with max -smax /smxl . the crystal structure of ligand bound pskai b provides a mechanistic view of perception and cleavage by kai s. based on the crystallization conditions and following a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . detailed investigation of the electron density, we were able to overrule common chemicals and place the (−)-gr d-oh ring with higher relative fitting values than other components. the absence of positive electron density peaks corresponding to the intact (−)-gr , and thus the presence of only the d-oh, raise questions of whether the s -d-oh adduct recapitulates a pre- or post- cleavage intermediate state of (−)-gr . the possibility that the trapped molecule represents a post cleavage state is intriguing and may provide a new intermediate state where s is covalently linked to the cleavage product. as such, the s -d-oh adduct could explain the single turnover cycle that was observed for kai s in this study. previous studies of the single turnover activity of d suggest that a covalent intermediate is formed between the catalytic histidine and serine . the chemical similarity of the d-oh butenolide ring of karrikin and gr suggests that the kl signal may share a parallel structure and perhaps will be biochemically processed via multiple steps and intermediate adducts. therefore, the significance of this study may also reveal a similar mechanism regarding sl perception and cleavage by d . our data in planta clearly demonstrate that pskai a and pskai b genes can replace the atkai ortholog, yet we were unable to conclude kai a/b ligand binding specificity by using the kl mimic compound (−)-gr . the ambiguity in detecting ligand specificity in vivo is likely to remain a challenge in the karrikin field until the identification of endogenous kl. once kl(s) will be revealed, it will be important to test the response of the arabidopsis complementation lines to kl(s) and further validate the function of the key residues l /s /m in planta. additionally, future studies with pea mutants will elucidate pskai a and pskai b functional divergence and reveal the distinct physiological functions, and in particularly the symbiotic relationship with am fungi, that could shed light on the differential expression patterns in the roots. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . this study illuminates the complex evolution of kai s in plants and particularly in legumes. we provide comprehensive structural and biochemical evidence of the specialization and sub-functionalization of kai receptors and their sensitivity to butenolide compounds. because of their ability to fix atmospheric nitrogen through plant–rhizobium symbiosis, legume crops such as pea or fava bean are attracting increasing attention for their agroecological potential. thus, better understanding of kar/kl perception and signaling in these staple crops may have far-reaching impacts on agro-systems and food security. methods protein sequence alignment and phylogenetic tree analyses representative kai sequences of amino acid sequences were downloaded from phytozome and specific genome databases as shown in figure s . alignment was performed in mega x using the muscle multiple sequence alignment algorithm . sequence alignment graphics were generated using clc genomics workbench v . the evolutionary history was inferred by using the maximum likelihood method and jtt matrix-based model . initial tree(s) for the heuristic search were obtained automatically by applying neighbor-join and bionj algorithms to a matrix of pairwise distances estimated using the jtt model, and then selecting the topology with superior log likelihood value. the percentage of trees in which the associated taxa clustered together is shown next to the branches . tree is drawn to scale, with branch lengths measured in the number of substitutions per site. analysis involved amino acid sequences with a total of positions in the final dataset. evolutionary analyses were conducted in mega x . constructs and generation of transgenic lines (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the expression vectors for transgenic arabidopsis were constructed by multisite gateway three-fragment vector construction kit (invitrogen). atkai and pskai a. constructs were tagged with xha epitope tag or mcitrine protein at their c-terminus. lines were resistant to hygromycin. the atkai native promoter ( . kb) was amplified by pcr with the primer atkai _promo_attb ( ’-ggggacaactttgtatagaaaagttgccttcacgaccagtatggtttactca- ‘) and atkai _promo_attb r ( ’- ggggactgcttttttgtacaaacttgcctctctaaagaagattcttctctggtt- ‘) from col- genomic dna and cloned into the pdonr-p p r vector, using gateway recombination (invitrogen). the xha with linker and mcitrine tags were cloned into pdonr-p rp (invitrogen) as described in de saint germain et al. . pskai a. , pskai a. and pskai b cds were pcr amplified from pisum cv. térèse cdna with the primers pskai a_attb ( ’- ggggacaagtttgtacaaaaaagcaggcttcatggggatagtggaagaagca- ‘); pskai a. _attb _stop ( ’-ggggaccactttgtacaagaaagctgggtccaaatctgcctcaagtttca- ‘); pskai a. _attb _stop ( ’- ggggaccactttgtacaagaaagctgggtcccttattggctcaatattaa- ‘); pskai b_attb ( ’- ggggacaagtttgtacaaaaaagcaggcttcatgggaatagtggaagaagc- ‘); pskai b_attb _stop ( ’-ggggaccactttgtacaagaaagctgggtcagctacaatatcataacgaa- ‘); and the atkai cds was pcr amplified from col- cdna with the primers atkai _attb ( ’-ggggacaagtttgtacaaaaaagcaggcttcatgggtgtggtagaagaagc- ‘) and atkai _attb _Δs ( ’-ggggaccactttgtacaagaaagctgggtccatagcaatgtcattacgaat- ‘) and then recombined into the pdonr vector (invitrogen). the suitable combination of atkai native promoter, atkai , pskai a. , pskai a. or pskai b and xha or mcitrine (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . was cloned into the ph m gw final destination vectors by using the three fragment recombination system and were thusly named patkai ::atkai - xha, patkai ::atkai - mcitrine, patkai ::pskai a. - xha, patkai ::pskai b- xha and patkai ::pskai a. - mcitrine. transformation of arabidopsis atkai - mutant was performed according to the conventional floral dipping method , with agrobacterium strain gv . for each construct, only a few independent t lines were isolated and all lines were selected in t . phenotypic analysis shown in figure e was performed on the t homozygous lines. hypocotyl elongation assays. arabidopsis seeds were surface sterilized by consecutive treatments of min % (v/v) ethanol with . % (w/v) sodium dodecyl sulfate (sds) and min % (v/v) ethanol. then seeds were sown on half-strength murashige and skoog (½ ms) media (duchefa biochemie) containing % agar, supplemented with μm (−)-gr or with . % dmso (control). seeds were stratified at °c ( days in dark) then transferred to the growth chamber at °c, under - µ e /m /sec of white light in long day conditions ( hr light/ hr dark). seedlings were photographed and hypocotyl lengths were quantified using imagej . plates of - seeds were sown for each genotype x treatment. using student t-tests, no statistically significantly different means were detected between plates. the data from the - seedlings were then used for a one-way anova. chemicals (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . enantiopure gr isomers were obtained as described in de saint germain et al. or purchased from strigolab. profluorescent probes (gc , gc ) were obtained as described in de saint germain et al. . protein preparation and purification pskai a. and pskai b were independently cloned and expressed as a × his-sumo fusion proteins from the expression vector pal (addgene). these were cloned utilizing primers pskai a_f ( ’-aaaacctctacttccaatcgatggggatagtggaagaag- ‘), pskai a. _r ( ’- ccacactcatcctccggttacaaatctgcctcaagtttc- ‘), pskai a. _r ( ’- ccacactcatcctccggttaccttattggctcaatattaagttg- ‘), pskai b_f ( ’- aaaacctctacttccaatcgatgggaatagtggaagaagc- ‘), and pskai b_r ( ’- ccacactcatcctccggtcaagctacaatatcataacgaatg- ‘). bl (de ) cells transformed with the expression plasmid were grown in lb broth at °c to an od of ∼ . and induced with . mm iptg for h. cells were harvested, re-suspended and lysed in extract buffer ( mm tris, ph . , mm nacl, mm imidazole, % glycerol). all his-sumo-pskai s were isolated from soluble cell lysate by ni-nta resin. the his- sumo-pskai was eluted with mm imidazole and subjected to anion-exchange. the eluted protein was than cleaved with tev (tobacco etch virus) protease overnight at °c. the cleaved his-sumo tag was removed by passing through a nickel sepharose and pskai was further purified by chromatography through a superdex- gel filtration column in mm hepes, ph . , mm nacl, mm dtt, % glycerol. all proteins were concentrated by ultrafiltration to – mg/ml− . rms , atd , atkai were expressed in bacteria with tev cleavable gst tag, purified and used as described in de saint germain et al. . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . enzymatic degradation of gr isomers by purified proteins ligands ( µm) were incubated without and with purified proteins ( µm) for min at ºc in pbs ( . ml, ph . ) in presence of (±)- -indanol ( µm) as the internal standard. the solutions were acidified to ph with % trifluoroacetic acid in ch cn (v/v) ( µ l) to quench the reaction and centrifuged ( min, , tr/min). thereafter the samples were subjected to rp-uplc-ms analyses using ultra performance liquid chromatography system equipped with a pda and a triple quadrupole mass spectrometer detector (acquity uplc-tqd, waters, usa). rp-uplc (hss c column, . μm, . mm × mm) with . % formic acid in ch cn and . % formic acid in water (aq. fa, . %, v/v, ph . ) as eluents [ % ch cn, followed by linear gradient from to % of ch cn ( min)] was carried out at a flow rate of . ml/min. the detection was performed by pda using the tqd mass spectrometer operated in electrospray ionization positive mode at . kv capillary voltage. the cone voltage and collision energy were optimized to maximize the signal and were respectively v for cone voltage and ev for collision energy and the collision gas used was argon at a pressure maintained near . . - mbar. enzymatic assay with pro-fluorescent probes enzymatic assay and analysis have been carried out as described in de saint germain et al. , using a tristar lb multimode microplate reader from berthold technologies. the experiments were repeated three times. protein melting temperatures (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . differential scanning fluorimetry (dsf) experiments were performed on a cfx touchtm real-time pcr detection system (bio-rad laboratories, inc., hercules, california, usa) using excitation and emission wavelengths of and nm, respectively. sypro orange (λex/λem : / nm; life technologies co., carlsbad, california, usa) was used as the reporter dye. samples were heat-denatured using a linear to °c gradient at a rate of . °c per minute after incubation at °c for min in the absence of light. the denaturation curve was obtained using cfx manager™ software. final reaction mixtures were prepared in triplicate in -well white microplates, and each reaction was carried out in μl scale in phosphate buffer saline (pbs) ( mm phosphate, ph . , mm nacl) containing μg protein (such that final reactions contained μm protein), - μm ligand (as shown on the figure a-h), % (v/v) dmso, and . μl sypro orange. plates were incubated in darkness for minutes before analysis. in the control reaction, dmso was added instead of ligand. the experiments were repeated three times. intrinsic tryptophan fluorescence assays and kinetics intrinsic tryptophan fluorescence assays and determination of the dissociation constant kd has been performed as described in de saint germain et al. , using the spark® multimode microplate reader from tecan. crystallization, data collection and structure determination the crystals of pskai b were grown at °c by the hanging-drop vapor diffusion method with . μl purified protein sample mixed with an equal volume of reservoir solution containing . m hepes ph . , . % v/v peg , . % v/v peg-me . the crystals of pskai b in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . complex with (−)-gr were grown at °c by the hanging-drop vapor diffusion method with . μl purified protein complex (preincubated with mm (−)-gr , strigolab) and mixed with an equal volume of reservoir solution containing . m hepes ph . , . % peg , . % v/v peg-me , mm (−)-gr . crystals of maximum size were obtained and harvested after weeks from the reservoir solution with additional % mpd serving as cryoprotectant. x- ray diffraction data was integrated and scaled with hkl package . pskai s crystal structures were determined by molecular replacement using the atkai model (pdb: z h) as the search model. all structural models were manually built, refined, and rebuilt with phenix and coot . structural biology modelling and analyses model structure illustrations were made by pymol . pskai a model structure was generated using itasser – . ligand identification, ligand-binding pocket analyses, and computing solvent accessible surface values analyses were carried out using phenix ligandfit , , , castp software , , and autodock vina , respectively. ligplot+ program was used for -d representation of protein-ligand interactions from standard pdb data format. data availability the atomic coordinates of apo and ligand-bound forms of pskai structures has been deposited in the protein data bank with accession codes k z and k , respectively. all relevant data are available from corresponding authors upon request. acknowledgements (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . we thank the beamline staff at als for help with data collection. this work is supported by uc davis new faculty start-up funds. the shabek laboratory is supported by national science foundation. this work is supported by the institut jean-pierre bourgin's plant observatory technological platforms. f.-d.b. is supported by charm at labex program (anr- -labx- ). a.d.s.g. is supported by agreenskills from the european union in the framework of the marie-curie fp cofund people programme and fellowship from saclay plant sciences (anr- -eur- ). author contributions am.g., f.-d.b., c.r., a.ds.g., and n.s. conceived and designed the experiments. n.s., a.ds.g., and am.g. conducted the protein purification, biochemical and crystallization experiments. n.s. and am.g. determined and analyzed crystal structures and conducted in silico studies. am.g., a.ds.g., and n.s. wrote the manuscript with the help from all other co-authors. author information authors declare no competing interests. correspondence and requests for materials should be addressed to n.s. (nshabek@ucdavis.edu) and a.ds.g. (alexandre.de-saint-germain@inrae.fr). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references . flematti, g. r., ghisalberti, e. l., dixon, k. w. & trengove, r. d. a compound from smoke that promotes seed germination. science ( -. ). , ( ). . nelson, d. c. et al. karrikins enhance light responses during germination and seedling development in arabidopsis thaliana. proc. natl. acad. sci. u. s. a. , – ( ). . sun, x. d. & ni, m. hyposensitive to light, an alpha/beta fold protein, acts downstream of elongated hypocotyl to regulate seedling de-etiolation. mol. plant , – ( ). . waters, m. t. et al. specialisation within the dwarf protein family confers distinct responses to karrikins and strigolactones in arabidopsis. development , – ( ). . flematti, g. r. et al. preparation of h-furo[ , -c]pyran- -one derivatives and evaluation of their germination-promoting activity. j. agric. food chem. , – ( ). . flematti, g. r., scaffidi, a., dixon, k. w., smith, s. m. & ghisalberti, e. l. production of the seed germination stimulant karrikinolide from combustion of simple carbohydrates. j. agric. food chem. , – ( ). . dixon, k. w., merritt, d. j., flematti, g. r. & ghisalberti, e. l. karrikinolide - a phytoreactive compound derived from smoke with applications in horticulture, ecological restoration and agriculture. acta hortic. , ( ). . stevens, j. c., merritt, d. j., flematti, g. r., ghisalberti, e. l. & dixon, k. w. seed germination of agricultural weeds is promoted by the butenolide -methyl- h-furo[ , - c]pyran- -one under laboratory and field conditions. plant soil , – ( ). . long, r. l. et al. prior hydration of brassica tournefortii seeds reduces the stimulatory effect of karrikinolide on germination and increases seed sensitivity to abscisic acid. ann. bot. , – ( ). . nelson, d. c. et al. f-box protein max has dual roles in karrikin and strigolactone signaling in arabidopsis thaliana. proc. natl. acad. sci. , – ( ). . guo, y., zheng, z., la clair, j. j., chory, j. & noel, j. p. smoke-derived karrikin perception by the a/b hydrolase kai from arabidopsis. proc. natl. acad. sci. , – ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . kagiyama, m. et al. structures of d and d l in the strigolactone and karrikin signaling pathways. genes to cells , – ( ). . stanga, j. p., smith, s. m., briggs, w. r. & nelson, d. c. suppressor of more axillary growth controls seed germination and seedling development in arabidopsis. plant physiol. , – ( ). . gutjahr, c. et al. rice perception of symbiotic arbuscular mycorrhizal fungi requires the karrikin receptor complex. science ( -. ). , – ( ). . li, w. et al. the karrikin receptor kai promotes drought resistance in arabidopsis thaliana. plos genet. , e ( ). . wang, l., waters, m. t. & smith, s. m. karrikin-kai signalling provides arabidopsis seeds with tolerance to abiotic stress and inhibits germination under conditions unfavourable to seedling establishment. new phytol. , – ( ). . scaffidi, a. et al. exploring the molecular mechanism of karrikins and strigolactones. bioorganic med. chem. lett. , – ( ). . yoneyama, k. recent progress in the chemistry and biochemistry of strigolactones. j. pestic. sci. , – ( ). . zhao, l. h. et al. crystal structures of two phytohormone signal-transducing α/β hydrolases: karrikin-signaling kai and strigolactone-signaling dwarf . cell res. , – ( ). . cook, c. e., whichard, l. p., turner, b., wall, m. e. & egley, g. h. germination of witchweed (striga lutea lour.): isolation and properties of a potent stimulant. science ( -. ). , – ( ). . sorefan, k. et al. max and rms are ortholosgous dioxygenase-like genes that regulate shoot branching in arabidopsis and pea. genes dev. , – ( ). . kapulnik, y. et al. strigolactones interact with ethylene and auxin in regulating root-hair elongation in arabidopsis. j. exp. bot. ( ) doi: . /jxb/erq . . rasmussen, a. et al. strigolactones suppress adventitious rooting in arabidopsis and pea. plant physiol. , – ( ). . lopez-obando, m., ligerot, y., bonhomme, s., boyer, f. d. & rameau, c. strigolactone biosynthesis and signaling in plant development. dev. ( ) doi: . /dev. . . akiyama, k., matsuzaki, k. i. & hayashi, h. plant sesquiterpenes induce hyphal (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . branching in arbuscular mycorrhizal fungi. nature , – ( ). . gomez-roldan, v. et al. strigolactone inhibition of shoot branching. nature , – ( ). . arite, t. et al. d , a strigolactone-insensitive mutant of rice, shows an accelerated outgrowth of tillers. plant cell physiol. , – ( ). . besserer, a. et al. strigolactones stimulate arbuscular mycorrhizal fungi by activating mitochondria. plos biol. , e ( ). . li, s. w., xue, l., xu, s., feng, h. & an, l. mediators, genes and signaling in adventitious rooting. bot. rev. , – ( ). . agusti, j. et al. strigolactone signaling is required for auxin-dependent stimulation of secondary growth in plants. proc. natl. acad. sci. u. s. a. , – ( ). . hamiaux, c. et al. dad is an α/β hydrolase likely to be involved in the perception of the plant branching hormone, strigolactone. curr. biol. , – ( ). . kapulnik, y. et al. strigolactones affect lateral root formation and root-hair elongation in arabidopsis. planta , – ( ). . bythell-douglas, r. et al. evolution of strigolactone receptors by gradual neo- functionalization of kai paralogues. bmc biol. , – ( ). . swarbreck, s. m., guerringue, y., matthus, e., jamieson, f. j. c. & davies, j. m. impairment in karrikin but not strigolactone sensing enhances root skewing in arabidopsis thaliana. plant j. , – ( ). . toh, s. et al. structure-function analysis identifies highly sensitive strigolactone receptors in striga. science ( -. ). , – ( ). . xu, y. et al. structural basis of unique ligand specificity of kai -like protein from parasitic weed striga hermonthica. sci. rep. , – ( ). . waters, m. t. et al. a selaginella moellendorffii ortholog of karrikin insensitive functions in arabidopsis development but cannot mediate responses to karrikins or strigolactones. plant cell , – ( ). . bürger, m. et al. structural basis of karrikin and non-natural strigolactone perception in physcomitrella patens. cell rep. , – ( ). . sun, y. k. et al. divergent receptor proteins confer responses to different karrikins in two ephemeral weeds. nat. commun. , ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . de saint germain, alexandre, jacobs, a., brun, g. & boyer, f.-d. a phelipanche ramosa kai protein perceives enzymatically strigolactones and isothiocyanates. biorxiv ( ) doi: . / . . . . . sun, y. k., flematti, g. r., smith, s. m. & waters, m. t. reporter gene-facilitated detection of compounds in arabidopsis leaf extracts that activate the karrikin signaling pathway. front. plant sci. , ( ). . carbonnel, s. et al. lotus japonicus karrikin receptors display divergent ligand-binding specificities and organ-dependent redundancy. biorxiv ( ) doi: . / . . conn, c. e. & nelson, d. c. evidence that karrikin-insensitive (kai ) receptors may perceive an unknown signal that is not karrikin or strigolactone. front. plant sci. , – ( ). . shabek, n. et al. structural plasticity of d –d ubiquitin ligase in strigolactone signalling. nature , – ( ). . xu, y. et al. structural analysis of htl and d proteins reveals the basis for ligand selectivity in striga. nat. commun. , ( ). . takeuchi, j. et al. rationally designed strigolactone analogs as antagonists of the d receptor. plant cell physiol. , – ( ). . hamiaux, c. et al. inhibition of strigolactone receptors by n-phenylanthranilic acid derivatives: structural and functional insights. j. biol. chem. , – ( ). . bythell-douglas, r. et al. the structure of the karrikin-insensitive protein (kai ) in arabidopsis thaliana. plos one , e ( ). . nakamura, h. et al. molecular mechanism of strigolactone perception by dwarf . nat. commun. , ( ). . zhao, l. h. et al. destabilization of strigolactone receptor dwarf by binding of ligand and e -ligase signaling effector dwarf . cell res. , – ( ). . yao, r. et al. dwarf is a non-canonical hormone receptor for strigolactone. nature , – ( ). . kreplak, j. et al. a reference genome for pea provides insight into legume genome evolution. nat. genet. , – ( ). . yao, j. et al. an allelic series at the karrikin insensitive locus of arabidopsis (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . thaliana decouples ligand hydrolysis and receptor degradation from downstream signalling. plant j. , – ( ). . scaffidi, a. et al. strigolactone hormones and their stereoisomers signal through two related receptor proteins to induce different physiological responses in arabidopsis. plant physiol. , – ( ). . de saint germain, a. et al. an histidine covalent receptor and butenolide complex mediates strigolactone perception. nat. chem. biol. , – ( ). . seto, y. et al. strigolactone perception and deactivation by a hydrolase receptor dwarf . nat. commun. , ( ). . conn, c. e. et al. convergent evolution of strigolactone perception enabled host detection in parasitic plants. science ( -. ). , – ( ). . weng, j. k., ye, m., li, b. & noel, j. p. co-evolution of hormone metabolism and signaling networks expands plant adaptive plasticity. cell , – ( ). . yoshida, h. et al. evolution and diversification of the plant gibberellin receptor gid . proc. natl. acad. sci. u. s. a. , e –e ( ). . kumar, s., stecher, g., li, m., knyaz, c. & tamura, k. mega x: molecular evolutionary genetics analysis across computing platforms. mol. biol. evol. , – ( ). . edgar, r. c. muscle: multiple sequence alignment with high accuracy and high throughput. nucleic acids res. , – ( ). . jones, d. t., taylor, w. r. & thornton, j. m. the rapid generation of mutation data matrices from protein sequences. bioinformatics , – ( ). . felsenstein, j. confidence limits on phylogenies: an approach using the bootstrap. evolution (n. y). , – ( ). . karimi, m., bleys, a., vanderhaeghen, r. & hilson, p. building blocks for plant gene assembly. plant physiol. , – ( ). . clough, s. j. & bent, a. f. floral dip: a simplified method for agrobacterium-mediated transformation of arabidopsis thaliana. plant j. , – ( ). . schneider, c. a., rasband, w. s. & eliceiri, k. w. nih image to imagej: years of image analysis. nat. methods , – ( ). . otwinowski, z. & minor, w. processing of x-ray diffraction data collected in oscillation (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . mode. methods enzymol. , – ( ). . lee, i. et al. a missense allele of karrikin-insensitive impairs ligand-binding and downstream signaling in arabidopsis thaliana. j. exp. bot. , – ( ). . adams, p. d. et al. phenix: a comprehensive python-based system for macromolecular structure solution. acta crystallogr. sect. d biol. crystallogr. , – ( ). . emsley, p., lohkamp, b., scott, w. g. & cowtan, k. features and development of coot. acta crystallogr. sect. d biol. crystallogr. , – ( ). . delano, w. l. the pymol molecular graphics system, version . . schrödinger llc ( ). . yang, j. & zhang, y. i-tasser server: new development for protein structure and function predictions. nucleic acids res. , w –w ( ). . roy, a., kucukural, a. & zhang, y. i-tasser: a unified platform for automated protein structure and function prediction. nat. protoc. , – ( ). . yang, j. et al. the i-tasser suite: protein structure and function prediction. nat. methods , – ( ). . moriarty, n. w., grosse-kunstleve, r. w. & adams, p. d. electronic ligand builder and optimization workbench (elbow): a tool for ligand coordinate and restraint generation. acta crystallogr. sect. d biol. crystallogr. , – ( ). . terwilliger, t. c., klei, h., adams, p. d., moriarty, n. w. & cohn, j. d. automated ligand fitting by core-fragment fitting and extension into density. acta crystallogr. sect. d biol. crystallogr. , – ( ). . binkowski, t. a., naghibzadeh, s. & liang, j. castp: computed atlas of surface topography of proteins. nucleic acids res. , – ( ). . dundas, j. et al. castp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. nucleic acids res. , w –w ( ). . steffen, c. et al. autodock and autodocktools : automated docking with selective receptor flexibility. j. comput. chem. , – ( ). . laskowski, r. a. & swindells, m. b. ligplot+: multiple ligand-protein interaction diagrams for drug discovery. j. chem. inf. model. , – ( ). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure legends figure . evolutionary analysis and differential expression of the legume pisum sativum kai s. (a) maximum likelihood phylogeny of representative kai amino acid sequences. node values represent percentage of trees in which the associated taxa clustered together. vertical rectangles highlight distinct kai family clades. black circle indicates legume duplication event. pink and green circles mark the position of pskai as and pskai b respectively. the tree is drawn to scale, with branch lengths measured in the number of substitutions per site. (b) pskai a and pskai b are homologues to atkai and encode α- β/hydrolases. schematic representation of the pskai a and pskai b genes; exons are in thick pink and green lines, intron colored in thin gray lines and utr regions shown as thick gray lines. bases are numbered from the start codon. pskai a shows splicing variants. spliced introns are shown as bent (“v”) lines. bold lines represent intron retention. inverted triangle (▼) indicates premature termination codons. (c-d) differential expression pattern of pskai a (c, pink) and pskai b (d, green). transcript levels in the different tissues of old wild-type pisum sativum plants (cv. terese) were determined by real-time pcr, relative to psef α. data are means ± se (n = pools of plants). inset drawing of a node showing the different parts of the pea compound leaf. (e) hypocotyl length of -day-old seedlings grown under low light at °c. data are means ± se (n = - ; plates of - seedlings per plate). grey bars: mock (dmso), orange bars: (−)-gr ( µm). complementation assays using the atkai promoter to express atkai (control) or pskai genes as noted above the graph. proteins were tagged with xha epitope or mcitrine protein. statistical differences were determined using a one-way anova with a tukey multiple comparison of means post-hoc test, statistical differences of p< . are represented by different letters. means with asterisks indicate significant inhibition (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . compared to mock-treated seedlings with *** corresponding to p ≤ . and * to p ≤ . , as measured by t- test. figure . biochemical analysis of pskai a and pskai b interactions with different gr isomers. the melting temperature curves of µm pskai a (a, c, e, g) or pskai b (b, d, f, h) with (+)-gr (a-b), (−)-gr (c-d), (+)- ’-epi-gr (e-f), or (−)- ’-epi-gr (g-h) at varying concentrations are shown as assessed by dsf. each line represents the average protein melt curve for three technical replicates; the experiment was carried out twice. (i) chemical structure of ligands used in dsf assay (a-h). (j) plots of fluorescence intensity versus sl concentrations. the change in intrinsic fluorescence of atkai , pskai a and pskai b was monitored (see figure s ) and used to determine the apparent kd values. the plots represent the mean of two replicates and the experiments were repeated at least three times. the analysis was performed with graphpad prism . software. figure . comparative enzymatic activity of atd , atkai , rms , pskai a and pskai b proteins with gr isomers. uplc-uv ( nm) analysis showing the formation of the abc tricycle from gr isomers. the enzymes ( µm) hydrolysis activity was monitored after incubation with µ m (+)-gr (yellow), (−)-gr (orange), (+)- ’-epi-gr (blue), or (−)- ’-epi-gr (purple). the indicated percentage corresponds to the hydrolysis rate calculated from the remaining gr isomer, quantified in comparison with indanol as an internal standard. data are means ± se (n = ). nd = no cleavage detected. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . the crystal structure of legume kai . (a) overview of pskai b structure. lid and base domains are colored in forest and light green respectively with secondary structure elements labeled. (b) structural alignment of pskai b and atkai (pdb id: hta) shown in light green and wheat colors respectively. root-mean-square deviation (rmsd) value of the aligned structures is shown. the location and conservation of legume kai unique residues, alanine in position (a ) and asparagine n , are highlighted on the structure shown as sticks as well as in reduced multiple sequence alignment from figure s . figure . structural divergence analysis of legume kai a and kai b. (a) structural alignment of pskai a and pskai b shown in pink and light green colors respectively. rmsd of aligned structures is shown. (b) analysis of pskai a and pskai b pocket volume, area, and morphology is shown by solvent accessible surface presentation. pocket size values were calculated via the castp server. (c) residues involved in defining ligand-binding pocket are shown on each structure as sticks. catalytic triad is shown in red. (d) residues l/m , s/l , and m/l are highlighted as divergent legume kai residues, conserved among all legume kai a or kai b sequences as shown in reduced multiple sequence alignment from figure s . figure . structural basis of pskai b ligand interaction. (a) surface (left) and cartoon (right) representations of pskai b crystal structure in complex with (−)-gr d-oh ring. protein structure is shown in blue/gray and ligand in orange. (b) close-up view on ligand interactions and contiguous density with the catalytic serine s . electron density for the ligand is shown in navy blue and blue/gray mesh for the labeled catalytic triad. the contiguous density (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . between s and the d-oh ring indicates a covalent bond. the electron density is derived from mfodfc ( fofc) map contoured at . σ. (c) side view of pskai b-d-oh structure shown in cartoon with highlighted (orange) the intact d-oh ring structure. -d ligand interaction plot was generated using ligplot+ software. dark grey line represents s -d-oh ring covalent bond. (d) schematic diagram of the proposed mechanism for the formation of the d-ring intermediate covalently bound to s . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . ro ot ro ot a pe x st ipu le te nd ril no de ax illa ry b ud ep ico tyl st em sh oo t a pe x fl ow er fl or al bu d . . . . r el at iv e p sk a i b tr an sc rip t l ev el s le r ka i - pa tk ai :: at ka i - x ha # pa tk ai :: at ka i - mc itri ne # pa tk ai :: ps ka i a . - x ha # pa tk ai :: ps ka i a . - x ha # pa tk ai :: ps ka i a . - x ha # pa tk ai :: ps ka i b - xh a # pa tk ai :: ps ka i a . - mc itri ne # h yp oc ot yl le ng th (m m ) mock (—)-gr ca le gu m e k a i b le gu m e k a i a d k a i tendril leaflets axillary bud stem stipules leaf node pskai a gene psat g - stop - atg b d stopatg e - stopaug stop pskai a. variant transcript - aug pskai a. variant transcript stop pskai b gene psat g pskai b transcript stop - aug a *** b a * a * c a a a a ro ot ro ot a pe x st ipu le te nd ril no de ax illa ry b ud ep ico tyl st em sh oo t a pe x fl ow er fl or al bu d . . . . . . r el at iv e p sk a i a tr an sc rip t l ev el s (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . . . µ m ( - ) - g r d f /d f m a x atkai pskai a. pskai b (+ )- g r (– )- g r (+ )- ’ -e pi -g r (– )- ’ -e pi -g r - - - - - -d (r fu )/d t temperature (°c) - - - - - -d (r fu )/d t - - - - - -d (r fu )/d t - - - - - -d (r fu )/d t - - - temperature (°c) - - - - - - - - - ba pskai a pskai b dc fe hg i (–)-gr (+)-gr (+)- ’-epi-gr (–)- ’-epi-gr j atkai pskai a pskai b µm (–)-gr kd= . +/- . µm kd= . +/- . µm kd= . +/- . µm df /d fm ax o o o o o o o o o o o o o o o o o o o o . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . at d at ka i ps ka i a ps ka i b rm s c le av ag e (% ) (+)-gr (—)-gr (+)- 'epi-gr (—)- ’epi-gr nd nd (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . b a pskai b ⚬ base lid aa ad b pskai b atkai rmsd ~ . a r n d ad ad ad h h ab b ac h h b aeb aa position: le gu m e af (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . ba pskai b pskai a c rmsd = . aa position: a b m l l s l m d pskai bpskai a ⚬ ⚬ a b sa vol. . . sa area . . sa circum. . . ligand accessible surface ad ad ad d-loop d-loop (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . f v s i g h d ’ ’ ’ ’ ’ ’ . . ’’ ’’ a pskai b (–)-gr (d-oh) b s = . ⚬ s d h s = . ( fofc) ’ ’ ’ ’ ’ ’ d-oh c d s o n n h o o c h h s o o o d o o c d pskai b pskai b = abc=cho tricycle ' ' ' ' ' ' ' s o n n hh o o c d pskai b h n n h h δ- δ+ δ- o o o oo oo oo hh abc =ch oh o ho oh ' ' d (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . table . data collection, phasing and refinement statistics pskai b (apo form, with glycerol) (−)-gr d-oh - bound pskai b data collection space group c c cell dimensions a, b, c (Å) . , . , . . , . , . α, β, γ (°) , , , . , resolution (Å) . - . ( . - . )* . - . ( . - . ) rsym . ( . ) . ( . ) i / σi . ( . ) . ( . ) completeness (%) . ( . ) . ( . ) redundancy . ( . ) . ( . ) refinement resolution (Å) . . no. reflections rwork / rfree (%) . / . . / . no. atoms protein ligand/ion water b-factors protein . . ligand/ion . . water . . r.m.s. deviations bond lengths (Å) . . bond angles (°) . . ramachandran favored (%) . . ramachandran allowed (%) . . ramachandran outliers (%) pdb id k z k *statistics for the highest-resolution shell are shown in parentheses. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . dimerization mechanism and structural features of human li-cadherin dimerization mechanism and structural features of human li-cadherin anna yui , jose m. m. caaveiro , *, daisuke kuroda , , makoto nakakido , satoru nagatoishi , shuichiro goda , takahiro maruno , susumu uchiyama and kouhei tsumoto , , * department of bioengineering, graduate school of engineering, the university of tokyo, - - , hongo, bunkyo-ku, tokyo - , japan department of global healthcare, graduate school of pharmaceutical sciences, kyushu university, - - , maidashi, higashi-ku, fukuoka-shi, fukuoka - , japan medical device development and regulation research center, school of engineering, the university of tokyo, - - , hongo, bunkyo-ku, tokyo - , japan institute of medical science, the university of tokyo, - - , shirokanedai, minato-ku, tokyo - , japan graduate school of science and engineering, soka university, - , tangi-cho, hachioji-shi, tokyo - japan department of biotechnology, graduate school of engineering, osaka university, - yamadaoka, suita-shi, osaka - , japan department of chemistry and biotechnology, school of engineering, the university of tokyo, - - , hongo, bunkyo-ku, tokyo - , japan *corresponding author: jose m. m. caaveiro and kouhei tsumoto e-mails: jose@phar.kyushu-u.ac.jp; tsumoto@bioeng.t.u-tokyo.ac.jp; running title: dimerization mechanism of li-cadherin keywords: cadherin, dimerization, cell adhesion, protein chemistry, crystal structure, small‐angle x‐ray scattering (saxs), analytical ultracentrifugation, molecular dynamics .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:jose@phar.kyushu-u.ac.jp mailto:tsumoto@bioeng.t.u-tokyo.ac.jp https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract li-cadherin is a member of cadherin superfamily which is a ca +-dependent cell adhesion protein. its expression is observed on various types of cells in the human body such as normal small intestine and colon cells, and gastric cancer cells. because its expression is not observed on normal gastric cells, li-cadherin is a promising target for gastric cancer imaging. however, since the cell adhesion mechanism of li-cadherin has remained unknown, rational design of therapeutic molecules targeting this cadherin has been complicated. here, we have studied the homodimerization mechanism of li- cadherin. we report the crystal structure of the li- cadherin ec - homodimer. the ec - homodimer exhibited a unique architecture different from that of other cadherins reported so far. the crystal structure also revealed that li-cadherin possesses a noncanonical calcium ion-free linker between ec and ec . various biochemical techniques and molecular dynamics (md) simulations were employed to elucidate the mechanism of homodimerization. we also showed that the formation of the homodimer observed by the crystal structure is necessary for li-cadherin- dependent cell adhesion by performing cell aggregation assay. introduction cadherins are a family of glycoproteins responsible for calcium ion-dependent cell adhesion ( ). there are more than types of cadherins in humans and many of them are responsible not only for cell adhesion but also involved in tumorigenesis ( ). human liver intestine-cadherin (li-cadherin) is a nonclassical cadherin composed of extracellular region which includes seven extracellular cadherin (ec) repeats, single transmembrane domain and a short cytoplasmic domain ( ). previous studies have reported the expression of li-cadherin on various types of cells, such as normal intestine cells, intestinal metaplasia, colorectal cancer cells and lymph node metastatic gastric cancer cells ( , ). because human li-cadherin is expressed on gastric cancer cells but not on normal stomach tissues, li- cadherin has been proposed as a target for imaging of metastatic gastric cancer ( ). previous studies have reported that li-cadherin works not only as a calcium ion-dependent cell adhesion molecule as other cadherins do ( ), but also shown that trans- dimerization of li-cadherin is necessary for water transport in normal intestinal cells ( ). sequence analysis of mouse li-, e-, n-, and p-cadherins has revealed sequence homology between ec - of li- cadherin and ec - of e-, n-, and p-cadherins, as well as between ec - of li-cadherin and ec - of classical cadherins ( ). from the sequence similarity and the proposed absence of calcium ion- binding motifs ( , ) between domains ec and ec , there is speculation that li-cadherin has evolved from the same five-domain cadherin precursor as that of classical cadherins ( ). however, li-cadherin is different from classical cadherins in some points such as the number of .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / extracellular cadherin repeats and the length and the sequence of the cytoplasmic domain. classical cadherins possess five cadherin repeats whereas li- cadherin possesses seven ( ). classical cadherins possess a conserved cytoplasmic domain comprising more than amino acids, whereas li- cadherin possess a short cytoplasmic domain consisting of residues with little or no sequence homology ( , ). the characteristics of li-cadherin at the molecular level, including the homodimerization mechanism, still remain unknown. homodimerization is the fundamental event in cadherin-mediated cell adhesion as has been shown previously ( , ). for example, classical cadherins form a homodimer mediated by the interaction between their two n- terminal cadherin repeats (ec - ) ( , ). in this study, we aimed to characterize li-cadherin at the molecular level as the molecular characteristics of the target protein may be significant for the rational design of therapeutic approaches. we have extensively validated li- cadherin to identify the homodimer architecture of li-cadherin. here, we report the crystal structure of human li-cadherin ec - homodimer. the crystal structure revealed a dimerization architecture different from that of any other cadherin reported so far. it also showed canonical calcium binding motifs between ec and ec , and between ec and ec , but not between ec and ec . by performing various biochemical and computational analysis based on this crystal structure, we interpreted the characteristics of li-cadherin molecule. additionally, we showed that the ec - homodimer is necessary for li-cadherin-dependent cell adhesion through cell aggregation assays. our study revealed possible architectures of li-cadherin homodimers at the cell surface and suggested the differential role of the two additional domains at the n-terminus compared with classical cadherins. results investigation of the domains responsible for the homodimerization of li-cadherin in order to predict which extracellular cadherin (ec) repeats are responsible for the homodimerization of li-cadherin, we compared the sequence of human li-cadherin and human classical cadherins (e-, n- and p-cadherins) using clustalw. as has been pointed out in the previous study ( ), it was revealed that ec - of human li- cadherin has sequence homology with ec - of human classical cadherins, and ec - of human li- cadherin has sequence homology with ec - of classical cadherins (fig. ). notably, trp locates at the n-terminus of li-cadherin ec and it has been suggested that this trp residue might function as the conserved trp of classical cadherin ec , which plays a crucial role in the formation of strand swap-dimer (ss-dimer) ( , , – ). considering that ec - of classical cadherins is responsible for homodimerization, we predicted that ec - and ec - of li-cadherin, which have sequence homology with ec - of classical .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cadherins are responsible for its dimerization. therefore, we first analyzed the homodimerization propensity of ec - , ec - , and ec - (table s ) of human li-cadherin. the dissociation constant (kd) of the ec - homodimer and the ec - homodimer were determined by sedimentation velocity analytical ultracentrifugation (sv-auc), obtaining values of . m and . m, respectively (fig. a). we did not observe dimer fraction when employing ec - despite the sequence similarity with ec - of classical cadherins and the presence of trp in ec located at the analogous position to that of trp in ec of classical cadherin (fig. a). the solution structure of ec - was monomeric as determined by small angle x-ray scattering (saxs), supporting the results of sv-auc (fig. b, fig. s and table s ). crystal structure analysis of ec - homodimer we successfully obtained the x-ray crystal structure of ec - at . Å resolution (fig. and table ). each ec domain was composed of the typical seven -strands seen in classical cadherins, and three calcium ions bound to each of the linkers connecting ec and ec , and ec and ec (fig. ). we also observed four n-glycans and two n- glycans bound to chain a and b, respectively, as predicted from the amino acid sequence. we could not resolve the entire length of these n-glycans because of their high flexibility. from the portion resolved, all n-glycans seem to face the opposite side of the dimer interface. two unique characteristics were observed in the crystal structure of li-cadherin: (i) the existence of a calcium-free linker between ec and ec , and (ii) the architecture of the homodimer. a previous study had suggested that li-cadherin lacks a calcium-binding motif between ec and ec ( ) and our crystal structure has confirmed that hypothesis experimentally. crystal structures of cadherins which possess calcium-free linker have been reported previously and the biological significance of the calcium-free linker has been discussed ( , ). the ec - region of li- cadherin assembled as an antiparallel homodimer in a conformation different from that of other cadherins, such as classical cadherins, which exhibit two step binding mode ( ) and to that of protocadherin b , which forms an antiparallel homodimer ( ) but with distinct characteristics to that of li-cadherin ec - . we performed sv-auc using li-cadherin ec - and obtained a kd value of . µm (fig. s ). the slight increase of the affinity suggested some contact between ec and ec , as can be predicted from the arrangement of ec of one chain and ec of the other chain in the crystal structure, although this interaction does not seem strong. calcium-free linker we first investigated the calcium-free linker between ec and ec . classical cadherins .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / generally adopt a crescent-like shape ( , ). however, in li-cadherin, the arch-shape was disrupted at the calcium-free linker region and because of that ec - exhibited unique alternating positioning of ec - with respect to ec - . generally, three calcium ions bound to the linker between each ec domain confer rigidity to the structure ( ). in fact, previous study on calcium- free linker of cadherin has shown that the linker showed some flexibility ( ). to compare the rigidity of the canonical linker with three calcium ions and the calcium-free linker in li-cadherin, we performed md simulations. in addition to the monomeric states, we also used the structure of the ec - homodimer as the initial structure of the simulations. after confirming the convergence of the simulations by calculating rmsd values of c atoms (fig. s , see experimental procedures for the details), we compared the rigidity of the linkers by calculating the rmsd values of c atoms of ec and ec , respectively, after superposing those of ec domain alone (fig. a, b). the ec domain in the monomer conformation exhibited the largest rmsd. the rmsd values of ec in the homodimer were significantly smaller than those of ec in the monomer form. dihedral angles consisting of c atoms of residues at the edge of each domain also indicated that the ec - monomer bends largely at the ca +-free linker (fig. s a-c). these results showed that the calcium-free linker between ec and ec is more flexible than the canonical linker (movie , ). another unique characteristic in the region surrounding the calcium-free linker was the existence of an -helix at the bottom of ec . to our best knowledge, this element at the bottom of the ec domain is not found in classical cadherins. the sequence alignment of the ec - domains of human li-, e-, n- and p-cadherin by clustalw indicated that the insertion of the -helix forming residues corresponded to the position immediately preceding the canonical calcium-binding motif dxe in classical cadherins ( ) (fig. s ). the asp and glu residues of the dxe motif in li-cadherin dimer ec and ec coordinate with calcium ions (fig. s a, b) and was maintained throughout the simulation (fig. s c~j). the -helix in ec might compensate for the absence of calcium by conferring some rigidity to the molecule. interaction analysis of ec - homodimer to validate if li-cadherin-dependent cell adhesion is mediated by the formation of homodimer observed in the crystal structure, it was necessary to find a mutant which exhibits decreased dimerization tendency. first, we analyzed the interaction between two ec - molecules in the crystal structure using the pisa server (table s ) ( ). the interaction was mostly mediated by ec of one chain of li-cadherin and ec of the other chain, engaging in hydrogen bonds and hydrophobic contacts (fig. ). the dimerization interface area was , . Å and the number of hydrogen bonds (distance between heavy atoms < .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . Å) was seven. based on the analysis of these interactions, we conducted site-directed mutagenesis to assess the contribution of each residue to the dimerization of li-cadherin. eleven residues showing a percentage of buried area greater than %, or one or more intermolecular hydrogen bonds (distance between heavy atoms < . Å) were individually mutated to ala. to quickly find the mutant with weaker homodimerization propensity, sec-mals was employed. we injected ec - wt or each mutant at µm in the chromatographic column. analysis of the molecular weight (mw) showed that the mw of f a was the smallest among all the mutations evaluated, and also including wt (fig. a and table ). the same observation was made when the samples were injected at µm (fig. s a). it was also revealed that among the samples analyzed, the elution volume of f a was the largest (fig. b and fig. s b). in summary, the mutational study using sec-mals suggested that the mutation f a was the most significant inhibiting homodimerization of ec - . we must note that the samples eluted as a single peak, corresponding to a fast equilibrium between monomers and dimers as reported in a previous study employing other cadherins ( ). although the samples were injected at µm, they eluted at ~ µm since sec will dilute the samples as they advance through the column. considering that the kd of dimerization of ec - wt determined by auc was . µm, at a protein concentration of µm, the largest fraction of the eluted sample should be monomer. this explains why the mw of the wt sample was smaller than the mw of the homodimer ( . kda), and why the differences in mw among the constructs were small. however, we assume that the decrease of mw and the increase of elution volume indicate the decrease of the proportion of homodimer in the eluted sample, indicating a smaller dimerization tendency caused by the mutations introduced in the protein. contribution of phe to dimerization although f does not seem to form extensive specific interactions with the partner molecule of li-cadherin in our crystal structure (fig. s ), its buried area upon dimerization was calculated to be % by the pisa server, engaging in van der waals interactions with other residues of ec - . to understand the role of phe in dimerization of li- cadherin, we conducted md simulations of ec - wt and ec - f a in the monomeric states, respectively. we first calculated the intramolecular distance between c atoms of the residues and . the simulations revealed that ala moves away from the strand that contains asn whereas phe kept closer to asn (fig. a, fig. s and movie , ). this movement suggests that the side chain of phe forms intramolecular interaction and is stabilized inside the pocket. superposition of ec (chain a) in the crystal structure of ec - and ec during the simulation of ec - f a monomer suggests that the large movement of the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / loop including ala would cause steric hindrance and would inhibit dimerization (fig. b). thermal stability analysis using differential scanning calorimetry (dsc) revealed that ec - f a had two unfolding peaks whereas that of ec - wt had a single peak (fig. c). these results suggested that a part of ec - f a molecule was destabilized by the mutation. in combination with the data from md simulations, we propose that phe contributes to dimerization of li-cadherin by restricting the movement of the residues around phe and thus preventing the steric hindrance by the large movement as observed by md simulations. dsc measurements showed that some of other mutants have lower thermal stability than wild type (table and fig. s ). however, because tm of f a is the lowest among the mutants evaluated, and because other mutants displaying lower tm than wild type did not exhibit a drastic decrease in homodimer affinity like f a, we conclude that among the residues evaluated by ala scanning, f was the most critical for the maintenance of homodimer structure and thermal stability. functional analysis of li-cadherin on cells to investigate if the disruption of the formation of ec - homodimer influences cell adhesion, we established a cho cell line expressing full-length li-cadherin wt or the mutant f a (including the transmembrane and cytoplasmic domains fused to gfp) that we termed ec - gfp and ec - f agfp (table s and fig. s ). we conducted cell aggregation assays and compared the cell adhesion ability of cells expressing each construct and mock cells (non-transfected flp-in cho) in the presence of calcium or in the presence of edta. the size distribution of cell aggregates was quantified using a micro-flow imaging (mfi) apparatus. ec - gfp showed cell aggregation ability in the presence of cacl . in contrast, ec - f agfp and mock cells did not show obvious cell aggregates in the presence of cacl (fig. a- c). from this result, it was revealed that f was crucial for li-cadherin-dependent cell adhesion and the formation of ec - homodimer in the cellular environment was indicated. difference of li-cadherin and classical cadherin we next performed cell aggregation assays using cho cells expressing various constructs of li- cadherin in which domains were deleted, to elucidate the mechanism of cell-adhesion induced by li-cadherin. li-cadherin ec - and ec - expressing cells were separately established (ec - gfp and ec - gfp) (table s and fig. s ). importantly, neither ec - nor ec - expressing cells showed cell aggregation ability in the presence of cacl (fig. ). ec - expressing cells did not aggregate, suggesting that effective dimerization requires full- length protein. the ec - homodimer observed by x-ray crystallography and detected by auc cannot be replicated by the ec - construct in a cellular .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / environment, suggesting that the overhang ec domain in the dimer belonging to one cell collides with the membrane of the opposing cell (steric hindrance) (fig. s a). it is also possible that inappropriate orientation of the approaching li- cadherin molecules would also contribute to the inability of ec - to dimerize (fig. s b). an alternative possibility is that the weaker dimerization of ec - (detected by auc) cannot maintain cell adhesion due to the mobility of the ca +-free linker between ec and ec . contrary to the canonical ca +-bound linker, such as the linker between ec and ec , the linker between ec and ec in li-cadherin does not possess a ca +. the lack of ca + resulted in greater mobility when ec - homodimer observed by crystal structure (fig. ) was not formed. the combination of low dimerization affinity and high mobility likely explain the absence of ec - driven cell adhesion (fig. s c, d). expression of ec - on the surface of the cells did not result in cell aggregation, an observation agreeing with the results of auc and saxs, which shows that ec - does not form a dimer. the truncation of ec - from li-cadherin generates cadherin similar to classical cadherin in the point of view that it has five extracellular domains and that it has a trp residue at the n-terminus. together with the crystal structure of ec - homodimer, which showed that trp was buried in its own hydrophobic pocket and was not participating in homodimerization (fig. s ), the fact that li- cadherin ec - did not aggregate represents a unique dimerization mechanism in li-cadherin. ec - and ec - expressing cells did not show aggregation ability even when they were mixed in equal amounts (fig. s ). this result excluded the possibility of nonsymmetrical interaction of the domains (e.g. ec - and ec - , ec - and ec - , etc.). discussion here, we show the homodimer architecture of li- cadherin ec - and the flexibility of ca +-free linker in li-cadherin monomer for the first time. the x-dimer or the strand-swap dimer formed by classical cadherins do not seem effective to drive li-cadherin-dependent cell adhesion, as these dimers would lead to large movements at the ca +- free linker even if the dimer was formed. we assume that the unique architecture of li-cadherin ec - homodimer was necessary to restrict the movement of ca +-free linker to maintain li- cadherin-dependent cell adhesion (fig. s ). several differences between li-cadherin and e- cadherin might explain the reason for the existence of non-canonical ca +-free linker. both li-cadherin and e-cadherin are expressed on normal intestine cells, however, their expression sites are different. li-cadherin is expressed at intercellular cleft and is excluded from adherence junction ( ), where e- cadherin is expressed ( ). even though li- cadherin is excluded from adherence junction, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / trans-interaction of li-cadherin is necessary to maintain water transport through intercellular cleft of intestine cells ( ). clustering on cell membrane might also be different. classical cadherins including e-cadherin are considered to form cluster on cell membrane to achieve cell adhesion ( ). lateral interaction interface of these cadherins was estimated from the crystal lattices. in contrast, we did not observe any crystal packing which suggests lateral interaction in our crystal structure. indeed, our crystal structure shows that n-terminal sugar chains are extended toward the opposite side of the homodimer interface, and this suggests that each homodimer does not participate in cis-interaction. considering that the interface area of the x-dimer and strand-swap dimer are much smaller than that of li-cadherin ec - dimer, we speculate that li- cadherin form homodimers with a broader interface to be able to maintain trans-interaction without formation of clusters on the cell membrane. expression of li-cadherin is also observed on various cancer cells such as gastric adenocarcinoma, colorectal cancer cells and pancreatic cancer cells ( , , ). the roles of li-cadherin on cancer cells have been discussed previously. for example, it was shown that inoculation of li-cadherin gene (cdh )-silenced cells in nude mice inhibited the progression of colorectal cancer ( ). in case of gastric cancer, the size of li-cadherin-positive tumor was significantly larger than that of li- cadherin-negative tumor ( ). considering that loss of cell adhesion ability by the downregulation of e- cadherin by epithelial mesenchymal transition (emt) is often observed in cancer cells ( , ), the fact that li-cadherin is upregulated in various types of cancer cells suggest that li-cadherin acts differently with e-cadherin on cancer cells. the unique architecture of the li-cadherin homodimer and the absence of interactions with intracellular (cytoplasmic) proteins ( ) suggest a distinctive role of li-cadherin in cancer cells with respect to that of classical cadherins. in summary, our study shows the novel characteristics of li-cadherin at the molecular level. our results suggest that molecules targeting interface of li-cadherin homodimer abrogate the li-cadherin-dependent cell adhesion. on the other hand, we estimate that molecules which restrict the movement of ca +-free linker might strengthen li- cadherin-dependent cell adhesion by stabilizing li- cadherin homodimer. experimental procedures protein sequence amino acid sequence of recombinant protein and li-cadherin expressing cho cells are summarized in table s . expression and purification of recombinant li- cadherin all li-cadherin constructs were expressed and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / purified using the same method. all constructs were cloned in pcdna . vector (thermofisher scientific). recombinant protein was expressed using expi ftm cells (thermofisher scientific) following manufacturer’s protocol. cells were cultured for three days after transfection at °c and % co . the supernatant was collected and filtered followed by dialysis against a solution composed of mm tris-hcl at ph . , mm nacl, and mm cacl . immobilized metal affinity chromatography was performed using ni-nta agarose (qiagen). protein was eluted by mm tris-hcl at ph . , mm nacl, mm cacl , and mm imidazole. final purification was performed by size exclusion chromatography (sec) using hiload / superdex pg column (cytiva) at °c equilibrated in buffer a ( mm hepes-naoh at ph . , mm nacl, and mm cacl ). unless otherwise specified, samples were dialyzed against buffer a before analysis and filtered dialysis buffer was used for assays. sedimentation velocity analytical ultracentrifugation (sv-auc) sv-auc experiments were conducted using the optima auc (beckman coulter) equipped with an -hole an ti rotor at °c with , . , , , , , and µm of ec - , ec - , ec - and ec - , dissolved in buffer a. protein sample ( µl) was loaded into the sample sector of a cell equipped with sapphire windows and mm double-sector charcoal-filled upon centerpiece. a volume of µl of buffer was loaded into the reference sector of each cell. data were collected at , rpm with a radial increment of µm using a uv detection system. the collected data were analyzed using continuous c(s) distribution model implemented in program sedfit (version . b) ( ) fitting for the frictional ratio, meniscus, time-invariant noise, and radial-invariant noise using a regularization level of . . the sedimentation coefficient ranges of - s were evaluated with a resolution of . the partial specific volumes of ec - , ec - , ec - and ec - were calculated based on the amino acid composition of each sample using program sednterp . ( ) and were . cm /g, . cm /g, . cm /g, and . cm /g, respectively. the buffer density and viscosity were calculated using program sednterp . as . g/cm and . cp, respectively. figures of c(s , w) distribution were generated using program gussi (version . . ) ( ). the weight-average sedimentation coefficient of each sample was calculated by integrating the range of sedimentation coefficients where peaks with obvious concentration dependence were observed. for the determination of the dissociation constant of monomer-dimer equilibrium, kd, the concentration dependence of the weight-average sedimentation coefficient was fitted to the monomer-dimer self- association model implemented in program sedphat (version . b) ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / solution structure analysis using saxs all measurements were performed at beamline bl- c ( ) of the photon factory (tsukuba, japan). the experimental procedure is described previously ( ). concentrations of ec - was µm. data were collected using a pilatus m (dectris). a wavelength was . Å with a camera distance cm. exposure time was seconds and raw data between s values of . and . Å- were measured. the background scattering intensity of buffer was subtracted from each measurement. the scattering intensities of four measurements were averaged to produce the scattering curve of ec - . data are placed on an absolute intensity scale. conversion factor was calculated based on the scattering intensity of water. the calculation of the theoretical curves of saxs and  values were performed using foxs server ( , ). md simulation molecular dynamics simulations of li-cadherin were performed using gromacs . ( ) with the charmm m force field ( ). a whole crystal structure of ec - homodimer, ec - monomer form, ec - f a monomer form and ec - monomer form was used as the initial structure of the simulations, respectively. ec - and ec - of chain a was extracted from ec - homodimer crystal structure to generate ec - monomer form and ec - monomer form, respectively. sugar chains were removed from the original crystal structure. missing residues were modelled by modeller . ( ). solvation of the structures were performed with tip p water ( ) in a rectangular box such that the minimum distance to the edge of the box was Å under periodic boundary conditions through the charmm-gui ( ). addition of n-bound type sugar chains (g f) and the mutation of phe in ec - monomer to ala were also performed through the charmm-gui ( , ). the protein charge was neutralized with added na or cl, and additional ions were added to imitate a salt solution of concentration . m. each system was energy- minimized for steps and equilibrated with the nvt ensemble ( k) for ns. further simulations were performed with the npt ensemble at k. the time step was set to fs throughout the simulations. a cutoff distance of Å was used for coulomb and van der waals interactions. long-range electrostatic interactions were evaluated by means of the particle mesh ewald method ( ). covalent bonds involving hydrogen atoms were constrained by the lincs algorithm ( ). a snapshot was saved every ps. all trajectories were analyzed using gromacs tools. rmsd, dihedral angles, distances between two atoms and clustering were computed by rms, gangle, distance and cluster modules, respectively. the convergence of the trajectories was confirmed by calculating rmsd values of c atoms (fig. s a, b and s a, b). as the molecule showed high flexibility at ca +-free linker, as for ec - wt .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / monomer, ec - f a monomer and ec - dimer, rmsd of each domain was calculated individually. five c atoms at n-terminus were excluded from the calculation of rmsd of ec as they were disordered. as the rmsd values were stable after running ns of simulations, we did not consider the first ns when we analyzed the trajectories. generation of ec - _plus md simulation of the ec - monomer was performed for ns. the trajectories from ns to ns were clustered using the ‘cluster’ tool of gromacs. the structure which exhibited the smallest average rmsd from all other structures of the largest cluster was termed ec - _plus and used for the purpose of comparison with the data in solution (saxs). crystallization of li-cadherin ec - purified li-cadherin ec - was dialyzed against mm hepes-naoh at ph . , mm nacl, and mm cacl . after the dialysis, the protein was concentrated to µm. optimal condition for crystallization was screened using an oryx instrument (douglas instruments) using commercial screening kits (hampton research). the crystal used for data collection was obtained in a crystallization solution containing mm sodium sulfate decahydrate and % w/v polyethylene glycol , at °c. suitable crystals were harvested, briefly incubated in mother liquor supplemented with % glycerol, and transferred to liquid nitrogen for storage until data collection. data collection and refinement diffraction data from a single crystal ec - were collected in beamline bl- a at the photon factory (tsukuba, japan) under cryogenic conditions ( k). diffraction images were processed with the program mosflm and merged and scaled with the program scala ( ) of the ccp suite ( ). the structure of the wt protein was determined by the molecular replacement method using the coordinates of p-cadherin (pdb entry code zmy) ( ) with the program phaser ( ). the models were refined with the programs refmac ( ) and built manually with coot ( ). validation was carried out with procheck ( ). data collection and structure refinement statistics are given in table . ucsf chimera was used to render all of the molecular graphics ( ). site-directed mutagenesis introduction of mutation to plasmid was performed as described previously ( ). size exclusion chromatography with multi-angle light scattering (sec-mals) the molecular weight of li-cadherin was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / determined using superose / gl column (cytiva) with inline dawn + multi angle light scattering (mals) (wyatt technology), uv detector (shimadzu), and refractive index (ri) detector (shodex). protien samples ( µl) were injected at µm or µm. analysis was performed using astra software (wyatt technology). concentration at the end of the chromatographic column was measured based on the uv absorbance. the protein conjugate method was employed for the analysis as sugar chains were bound to li-cadherin. all detectors were calibrated using bovine serum albumin (bsa) (sigma- aldrich). comparison of thermal stability by dsc dsc measurement was performed using microcal vp-capillary dsc (malvern). the measurement was performed from °c to °c at the scan rate of °c min- . data was analyzed using origin software. establishment of cho cells expressing li- cadherin the dna sequence of monomeric gfp was fused at the c-terminal of all human li-cadherin constructs of which stable cell lines were established and was cloned in pcdnatm /frt vector (thermofisher scientific). cho cells stably expressing li-cadherin-gfp were established using flp-intm-cho cell line following the manufacturer’s protocol (thermofisher scientific). cloning was performed by the limiting dilution- culture method. cells expressing gfp were selected and cultivated. observation of the cells were performed by in cell analyzer (cytiva). the cells were cultivated in ham’s f- nutrient mixture (thermofisher scientific) supplemented with % fetal bovine serum (fbs), % l- glutamine or % glutamaxtm-i (thermofisher scientific), % penicillin-streptomycin, and . mg ml- hygromycin b at °c and . % co . cell imaging cells ( µl) were added to a -well plate (greiner) at x cells ml- and cultured overnight. after washing the cells with wash medium (ham’s f- nutrient mixture (thermofisher scientific) supplemented with % fetal bovine serum (fbs), % glutamaxtm-i, % penicillin-streptomycin), hoechst (thermofisher scientific) ( µl) was added to each well at . µg ml- . the plate was incubated at room temperature for minutes. cells were washed with wash medium twice and with x hmf ( mm hepes-naoh at ph . , mm nacl, . mm kcl, . mm na hpo , mm cacl , and . mm glucose) twice. after that, x hmf ( µl) was loaded to each well and the images were taken with an in cell analyzer instrument (cytiva) using the fitc filter ( / excitation, / emission) and the dapi filter ( / excitation, / emission) with x . na .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / objective lens (nikon). cell aggregation assay cell aggregation assay was performed by modifying the methods described previously ( , ). cells were detached from cell culture plate by adding x hmf supplemented with . % trypsin and placing on a shaker at rpm for minutes at °c. fbs was added to the final concentration of % to stop the trypsinization. cells were washed with x hmf supplemented with % fbs once and with x hmf twice to remove trypsin. cells were suspended in x hmf at x cells ml- . µl of the cell suspension was loaded into - well plate coated with % w/v bsa. edta was added if necessary. after incubating the plate at room temperature for minutes, -well plate was placed on a shaker at rpm for minutes at °c. micro-flow imaging (mfi) micro-flow imaging (brightwell technologies) was used to count the particle number and to visualize the cell aggregates after cell aggregation assay. after the cell aggregation assay described above, the plate was incubated at room temperature for minutes and µl of % paraformaldehyde phosphate buffer solution (nacalai tesque) was loaded to each well. the plate was incubated on ice for more than minutes. images of the cells were taken using evos® xl core imaging system (thermofisher scientific) if necessary. after that, cells were injected to mfi. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / data availability the coordinates and structure factors of li-cadherin ec - have been deposited in the protein data bank with entry code cym. all remaining data are contained within the article. acknowledgements we thank dr. s. kudo and dr. h. akiba for expert advice. we thank dr. o. kusano-arai, dr. h. iwanari and dr. t. hamakubo for providing us with gene sequence of li-cadherin. funding and additional information the supercomputing resources in this study were provided by the human genome center at the institute of medical science, the university of tokyo, japan. this work was funded by a grant-in-aid for scientific research (a) h (k.t.) and a grant-in-aid for scientific research (b) h (k.t.) from japan society for the promotion of science, a grant-in-aid for scientific research on innovative areas h and h (k.t.) from ministry of education, culture, sports, science and technology, and a grant-in-aid for jsps fellows j (a.y.) from japan society for the promotion of science. we are grateful to the staff of the photon factory (tsukuba, japan) for excellent technical support. access to beamlines bl- a and bl- c was granted by the photon factory advisory committee (proposal numbers g and g ). conflict of interest the authors declare that they have no conflicts of interest with the contents of this article. references . takeichi masatoshi ( ) the cadherins: cell-cell adhesion molecules controlling animal morphogenesis. development. , – . van roy, f. ( ) beyond e-cadherin: roles of other cadherin superfamily members in cancer. nat. rev. cancer. , – . wendeler, m. w., praus, m., jung, r., hecking, m., metzig, c., and geßner, r. ( ) ksp- cadherin is a functional cell-cell adhesion molecule related to li-cadherin. exp. cell res. , .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / – . hinoi, t., lucas, p. c., kuick, r., hanash, s., cho, k. r., and fearon, e. r. ( ) cdx regulates liver intestine-cadherin expression in normal and malignant colon epithelium and intestinal metaplasia. gastroenterology. , – . ko, s., chu, k. m., luk, j. m., wong, b. w., yuen, s. t., leung, s. y., and wong, j. ( ) overexpression of li-cadherin in gastric cancer is associated with lymph node metastasis. biochem. biophys. res. commun. , – . matsusaka, k., ushiku, t., urabe, m., fukuyo, m., abe, h., ishikawa, s., seto, y., aburatani, h., hamakubo, t., kaneda, a., and fukayama, m. ( ) coupling cdh and cldn markers for comprehensive membrane-targeted detection of human gastric cancer. oncotarget. , – . berndorff, d., gessner, r., kreft, b., schnoy, n., lajous-petter, a. m., loch, n., reutter, w., hortsch, m., and tauber, r. ( ) liver-intestine cadherin: molecular cloning and characterization of a novel ca +-dependent cell adhesion molecule expressed in liver and intestine. j. cell biol. , – . weth, a., dippl, c., striedner, y., tiemann-boege, i., vereshchaga, y., golenhofen, n., bartelt- kirbach, b., and baumgartner, w. ( ) water transport through the intestinal epithelial barrier under different osmotic conditions is dependent on li-cadherin trans-interaction. tissue barriers. . / . . . jung, r., wendeler, m. w., danevad, m., himmelbauer, h., and geßner, r. ( ) phylogenetic origin of li-cadherin revealed by protein and gene structure analysis. cell. mol. life sci. , – . shapiro, l., fannon, a. m., kwong, p. d., thompson, a., lehmann, m. s., gerhard, g., als- nielsen, j., als-nielsen, j., colman, d. r., and hendrickson, w. a. ( ) structural basis of cell- cell adhesion by cadherins. nature. , – . nagar, b., overduin, m., ikura, m., and rini, j. m. ( ) structural basis of calcium-induced e- cadherin rigidification and dimerization. nature. , – . kreft, b., berndorff, d., böttinger, a., finnemann, s., wedlich, d., hortsch, m., tauber, r., and gener, r. ( ) li-cadherin-mediated cell-cell adhesion does not require cytoplasmic interactions. j. cell biol. , – . brasch, j., harrison, o. j., honig, b., and shapiro, l. ( ) thinking outside the cell: how cadherins drive adhesion. trends cell biol. , – . nicoludis, j. m., vogt, b. e., green, a. g., schärfe, c. p. i., marks, d. s., and gaudet, r. ( ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / antiparallel protocadherin homodimers use distinct affinity-and specificity-mediating regions in cadherin repeats - . elife. , – . harrison, o. j., bahna, f., katsamba, p. s., jin, x., brasch, j., vendome, j., ahlsen, g., carroll, k. j., price, s. r., honig, b., and shapiro, l. ( ) two-step adhesive binding by classical cadherins. nat. struct. mol. biol. , – . parisini, e., higgins, j. m. g., liu, j. huan, brenner, m. b., and wang, j. huai ( ) the crystal structure of human e-cadherin domains and , and comparison with other cadherins in the context of adhesion mechanism. j. mol. biol. , – . boggon, t. j., murray, j., chappuis-flament, s., wong, e., gumbiner, b. m., and shapiro, l. ( ) c-cadherin ectodomain structure and implications for cell adhesion mechanisms. science ( -. ). , – . kudo, s., caaveiro, j. m. m., miyafusa, t., goda, s., ishii, k., matsuura, t., sudou, y., kodama, t., hamakubo, t., and tsumoto, k. ( ) structural and thermodynamic characterization of the self-adhesive properties of human p-cadherin. mol. biosyst. , – . jin, x., walker, m. a., felsövályi, k., vendome, j., bahna, f., mannepalli, s., cosmanescu, f., ahlsen, g., honig, b., and shapiro, l. ( ) crystal structures of drosophila n-cadherin ectodomain regions reveal a widely used class of ca +-free interdomain linkers. proc. natl. acad. sci. u. s. a. , e –e . araya-secchi, r., neel, b. l., and sotomayor, m. ( ) an elastic element in the protocadherin- tip link of the inner ear. nat. commun. . /ncomms . harrison, o. j., jin, x., hong, s., bahna, f., ahlsen, g., brasch, j., wu, y., vendome, j., felsovalyi, k., hampton, c. m., troyanovsky, r. b., ben-shaul, a., frank, j., troyanovsky, s. m., shapiro, l., and honig, b. ( ) the extracellular architecture of adherens junctions revealed by crystal structures of type i cadherins. structure. , – . krissinel, e., and henrick, k. ( ) inference of macromolecular assemblies from crystalline state. j. mol. biol. , – . harrison, o. j., bahna, f., katsamba, p. s., jin, x., brasch, j., vendome, j., ahlsen, g., carroll, k. j., price, s. r., honig, b., and shapiro, l. ( ) two-step adhesive binding by classical cadherins. nat. struct. mol. biol. , – . boller, k., vestweber, d., and kemler, r. ( ) cell-adhesion molecule uvomorulin is localized in the intermediate junctions of adult intestinal epithelial cells. j. cell biol. , – . grötzinger, c., kneifel, j., patschan, d., schnoy, n., anagnostopoulos, i., faiss, s., tauber, r., wiedenmann, b., and geßner, r. ( ) li-cadherin: a marker of gastric metaplasia and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / neoplasia. gut. , – . liu, x., huang, y., yuan, h., qi, x., manjunath, y., avella, d., kaifi, j. t., miao, y., li, m., jiang, k., and li, g. ( ) disruption of oncogenic liver-intestine cadherin (cdh ) drives apoptotic pancreatic cancer death. cancer lett. , – . bartolomé, r. a., barderas, r., torres, s., fernandez-aceñero, m. j., mendes, m., garcía- foncillas, j., lopez-lucendo, m., and casal, j. i. ( ) cadherin- interacts with α β integrin to regulate cell proliferation and adhesion in colorectal cancer cells causing liver metastasis. oncogene. , – . wang, j., yu, j. c., kang, w. m., wang, w. z., liu, y. q., and gu, p. ( ) the predictive effect of cadherin- on lymph node micrometastasis in pn gastric cancer. ann. surg. oncol. , – . huang, r. y. j., guilford, p., and thiery, j. p. ( ) early events in cell adhesion and polarity during epithelialmesenchymal transition. j. cell sci. , – . lamouille, s., xu, j., and derynck, r. ( ) molecular mechanisms of epithelial–mesenchymal transition. nat. rev. mol. cell biol. , – . kreft, b., berndorff, d., böttinger, a., finnemann, s., wedlich, d., hortsch, m., tauber, r., and gener, r. ( ) li-cadherin-mediated cell-cell adhesion does not require cytoplasmic interactions. j. cell biol. , – . schuck, p. ( ) size-distribution analysis of macromolecules by sedimentation velocity ultracentrifugation and lamm equation modeling. biophys. j. , – . laue, t. m., shah, b., ridgeway, t. m., and pelletier, s. l. ( ) computer-aided interpretation of analytical sedimentation data for proteins. anal. ultracentrifugation biochem. polym. sci. . brautigam, c. a. ( ) chapter five - calculations and publication-quality illustrations for analytical ultracentrifugation data. in methods in enzymology (cole, j. l. b. t.-m. in e. ed), pp. – , academic press, , – . schuck, p. ( ) on the analysis of protein self-association by sedimentation velocity analytical ultracentrifugation. anal. biochem. , – . shimizu, n., mori, t., nagatani, y., ohta, h., saijo, s., takagi, h., takahashi, m., yatabe, k., kosuge, t., and igarashi, n. ( ) bl- c, the small-angle x-ray scattering beamline at the photon factory. aip conf. proc. . / . . schneidman-duhovny, d., hammel, m., tainer, j. a., and sali, a. ( ) accurate saxs profile computation and its assessment by contrast variation experiments. biophys. j. , – . schneidman-duhovny, d., hammel, m., tainer, j. a., and sali, a. ( ) foxs, foxsdock and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / multifoxs: single-state and multi-state structural modeling of proteins and their complexes based on saxs profiles. nucleic acids res. , w –w . abraham, m. j., murtola, t., schulz, r., páll, s., smith, j. c., hess, b., and lindahl, e. ( ) gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. softwarex. – , – . huang, j., rauscher, s., nawrocki, g., ran, t., feig, m., de groot, b. l., grubmüller, h., and mackerell, a. d. ( ) charmm m: an improved force field for folded and intrinsically disordered proteins. nat. methods. , – . eswar, n., webb, b., marti-renom, m. a., madhusudhan, m. s., eramian, d., shen, m., pieper, u., and sali, a. ( ) comparative protein structure modeling using modeller. curr. protoc. bioinforma. , . . - . . . jorgensen, w. l., chandrasekhar, j., madura, j. d., impey, r. w., and klein, m. l. ( ) comparison of simple potential functions for simulating liquid water. j. chem. phys. , – . jo, s., kim, t., iyer, v. g., and im, w. ( ) charmm-gui: a web-based graphical user interface for charmm. j. comput. chem. , – . jo, s., song, k. c., desaire, h., mackerell, a. d., and im, w. ( ) glycan reader: automated sugar identification and simulation preparation for carbohydrates and glycoproteins. j. comput. chem. , – . darden, t., york, d., and pedersen, l. ( ) particle mesh ewald: an n·log(n) method for ewald sums in large systems. j. chem. phys. , – . hess, b., bekker, h., berendsen, h. j. c., and fraaije, j. g. e. m. ( ) lincs: a linear constraint solver for molecular simulations. j. comput. chem. , – . evans, p. ( ) scaling and assessment of data quality. acta crystallogr. sect. d biol. crystallogr. , – . winn, m. d., ballard, c. c., cowtan, k. d., dodson, e. j., emsley, p., evans, p. r., keegan, r. m., krissinel, e. b., leslie, a. g. w., mccoy, a., mcnicholas, s. j., murshudov, g. n., pannu, n. s., potterton, e. a., powell, h. r., read, r. j., vagin, a., and wilson, k. s. ( ) overview of the ccp suite and current developments. acta crystallogr. sect. d biol. crystallogr. , – . kudo, s., caaveiro, j. m. m., and tsumoto, k. ( ) adhesive dimerization of human p- cadherin catalyzed by a chaperone-like mechanism. structure. , – . mccoy, a. j., grosse-kunstleve, r. w., adams, p. d., winn, m. d., storoni, l. c., and read, r. j. ( ) phaser crystallographic software. j. appl. crystallogr. , – .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . murshudov, g. n. ( ) refinement of macromolecular structures by the maximum-likelihood method. acta crystallogr. , – . emsley, p., lohkamp, b., scott, w. g., and cowtan, k. ( ) features and development of coot. acta crystallogr. sect. d biol. crystallogr. , – . laskowski, r. a. ( ) procheck-a program to check the stereochemical quality of protein structures. j. appl. crystallogr. , – . pettersen, e. f., goddard, t. d., huang, c. c., couch, g. s., greenblatt, d. m., meng, e. c., and ferrin, t. e. ( ) ucsf chimera—a visualization system for exploratory research and analysis. j. comput. chem. , – . yui, a., akiba, h., kudo, s., nakakido, m., nagatoishi, s., and tsumoto, k. ( ) thermodynamic analyses of amino acid residues at the interface of an antibody b a and its antigen roundabout homolog . j. biochem. . /jb/mvx . urushihara, h., takeichi, m., hakura, a., and okada, t. s. ( ) different cation requirements for aggregation of bhk cells and their transformed derivatives. j. cell sci. , – . urushihara, h., ozaki, h. s., and takeichi, m. ( ) immunological detection of cell surface components related with aggregation of chinese hamster and chick embryonic cells. dev. biol. , – .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table : data collection and refinement statistics. statistical values given in parenthesis refer to the highest resolution bin. data collection li-cadherin (ec - ) space group p unit cell a, b, c (Å) . , . , . α, β, γ (°) . , . , . resolution (Å) . - . ( . - . ) wavelength . observations , ( , ) unique reflections , ( , ) rmerge. . ( . ) rp.i.m. . ( . ) cc / . ( . ) i / σ (i) . ( . ) multiplicity . ( . ) completeness (%) . ( . ) refinement statistics resolution (Å) . - . rwork / rfree (%) . / . no. protein chains no. atoms protein , ca + water b-factor (Å ) protein . ca + . water . ramachandran plot preferred (%) . allowed (%) . outliers (%) rmsd bond (Å) . rmsd angle (°) . pdb entry code cym .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table : results of ala scanning sec-mals dsc sample concentration (µm) mw (kda) concentration (µm) tm (°c) tm (°c) wt . . . ± . n.d. . i a . . . ± . n.d. . l a . . . ± . n.d. . n a . . . ± . . ± . . v a . . . ± . . ± . . n a . . . ± . . ± . . f a . . . ± . . ± . . l a . . . ± . . ± . . n a . . . ± . n.d. n.d. f a . . . ± . n.d. . y a . . . ± . n.d. . q a . . . ± . n.d. n.d. . the molecular weight of the protein does not include the glycan moiety. the theoretical molecular weight of ec - wild type without glycan is . kda. . tm ± error is shown. . not determined. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . schematic view of extracellular cadherin (ec) domains of classical cadherin and li-cadherin. domains connected by dotted lines have sequence homology. ec - , ec - and ec - , which were used for the experiments are indicated by parentheses. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . analysis on dimerization state of li-cadherin ec - , ec - and ec - . a. sedimentation plot of sv-auc. dimerization of ec - and ec - was confirmed. kd of ec - and ec - homodimer was . µm and . µm, respectively. b. scattering curve of saxs and theoretical curve of ec - calculated from modified crystal structure. method to produce modified crystal structure is explained in experimental procedures and supplementary information. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . crystal structure of ec - homodimer. calcium ions are depicted in magenta. no calcium ions were observed between domains ec and ec in either molecule. four partial n-glycans were modeled in chain a (light green) and two in chain b (cyan) (the amino acid sequence of ec - is given in table s ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . computational analysis of the flexibility of calcium-free linker. a. schematic view of how rmsd values were calculated. b. rmsd values of ec c or ec c against ec c. chain a of ec - dimer structure was employed as the initial structure. c. rmsd values of ec c or ec c in chain a of the dimer structure against ec c in the chain a. rmsd values and standard deviations are shown in parenthesis in angstrom unit. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . residues involved in the intermolecular interaction in ec - homodimer crystal structure. the non-polar interaction residues are shown in black and purple rectangles (top panels). residues involved in hydrogen bonds (black solid lines) are shown within the red and blue rectangles (bottom panel). residues indicated with an asterisk were individually mutated to ala to evaluate their contribution to dimerization. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . mutagenesis analysis by sec-mals. a. molecular weight measured by mals. f a exhibited the smallest molecular weight among all constructs tested. the samples were injected at µm. error bars indicate experimental uncertainties. b. sec chromatograms obtained using sec-mals. protein was injected at µm. chromatogram of wt and f a are indicated in black (bold line) and green, respectively. elution volume of the peak top of f a was the largest among all constructs. c. sec chromatogram and mw plots of ec - wt and f a. graphs of other mutants are shown in fig. s c. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . mechanism of dimerization facilitated by phe . a. the distance between phe (orange) c or ala (purple) c and asn (grey) c was evaluated by md simulations. the c atoms are indicated by black circles. the distance calculated by the simulations is indicated with red line. time course is shown on the portion of the panel at the upper part of each structure. each md simulation run is shown in red, black and blue. averages and standard deviations from to ns of each simulation are shown in parentheses. b. structure alignment of ec (chain a) in ec - homodimer crystal structure and ec during the md simulation of ec - f a monomer. a snapshot of . ns in run was chosen as it showed the largest distance between asn and ala . ala is indicated in purple. the loop indicated with the black arrow would cause steric hindrance towards the formation of the homodimer. c. thermal stability of ec - wt and f a determined by differential scanning calorimetry. two transitions appeared in the sample of f a. the first transition at lower temperature seems to have appeared due to the loss of intramolecular interaction around the residue at position . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . cell aggregation assay. a. size distribution of cell aggregates determined by mfi. particles that were µm or larger were regarded as cell aggregates. only ec - wt expressing cells in the existence of mm cacl showed significant number of cell aggregates that were µm or larger. data show the mean ± sem of four measurements. b. microscopic images of cell aggregates taken after adding % pfa and incubating the plate on ice for minutes. c. images of cell aggregates taken by mfi. cell aggregates belonging to the largest size population of each construct obtained in the presence of mm cacl ( ~ m for ec - gfp, ~ m for ec - f agfp and ~ m for flp-in cho) are shown. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . cell adhesion mediated by short constructs. cell aggregation assay using ec - gfp and ec - gfp. ec - gfp and flp-in cho (mock cells) were used as positive and negative control, respectively. particles that were µm or larger were considered as cell aggregates. the number of cell aggregates of both ec - gfp and ec - gfp in the presence or absence of ca + were determined. data show mean ± sem of four measurements. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / priming mycobacterial esx-secreted protein b to form a channel-like structure abril gijsbers , vanesa vinciauskaite , axel siroy ,†, ye gao , giancarlo tria ,‡, anjusha mathew , nuria sánchez-puig , , carmen lópez-iglesias , peter j. peters * and raimond b. g. ravelli * division of nanoscopy, maastricht multimodal molecular imaging institute (m i), maastricht university, universiteitssingel , er, maastricht, the netherlands division of imaging mass spectrometry, maastricht multimodal molecular imaging institute (m i), maastricht university, universiteitssingel , er, maastricht, the netherlands departamento de biomacromoléculas instituto de química, universidad nacional autónoma de méxico, ciudad universitaria, ciudad de méxico, méxico †present address: european institute of chemistry and biology (iecb), pessac, france ‡present address: dipartimento di chimica "ugo schiff", università degli studi di firenze, via della lastruccia, - i- sesto fiorentino, italia *corresponding author: rbg.ravelli@maastrichtuniversity.nl (rbgr), pj.peters@maastrichtuniversity.nl (pjp) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract esx- is a major virulence factor of mycobacterium tuberculosis, a secretion machinery directly involved in the survival of the microorganism from the immune system defence. it disrupts the phagosome membrane of the host cell through a contact-dependent mechanism. recently, the structure of the inner-membrane core complex of the homologous esx- and esx- was resolved; however, the elements involved in the secretion through the outer membrane or those acting on the host cell membrane are unknown. protein substrates might form this missing element. here, we describe the oligomerisation process of the esx- substrate espb, which occurs upon cleavage of its c-terminal region and is favoured by an acidic environment. cryo-electron microscopy data are presented which show that espb from different mycobacterial species have a conserved quaternary structure, except for the non-pathogenic species m. smegmatis. espb assembles into a channel with dimensions and characteristics suitable for the transit of esx- substrates, as shown by the presence of another espb trapped within. our results provide insight into the structure and assembly of espb, and suggests a possible function as a structural element of esx- . keywords cryo-em, espb, esx- , mycobacteria, preferential orientation .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction tuberculosis (tb) is an infectious disease caused by the bacillus mycobacterium tuberculosis. it is estimated that one-quarter of the world’s population is currently infected with latent bacteria. in , million people developed the disease from which . million were caused by multidrug- resistant strains. even though tb is curable, . million people succumb to it every year (world health organization, ). the current treatment is long and with serious side effects, often driving the patient to terminate the therapy before its conclusion (schaberg et al., ). this has contributed to an increase in the number of patients suffering from multidrug- and extensively drug- resistant tb. while treatment is available for some of these resistant strains, the regimen is usually longer, more expensive and sometimes more toxic. for this reason, research on mycobacterial pathogenesis is vital to find a proper target in order to develop more effective therapeutics and vaccines. the high incidence of tb relates to the ability of m. tuberculosis to evade the host immune system (ferluga et al., ). this ability is related to multiple factors, one of which is a complex cell envelope with low permeability that plays a crucial role in drug resistance and in survival under harsh conditions (brennan & nikaido, ). likewise, pathogenic mycobacteria secrete virulence factors that manipulate the environment and compromise the host immune response. mycobacteria have up to five specialised secretion machineries that carry out this process, named esx- to - (together known as the type vii secretion system or t ss). the core components of the inner-membrane part of t ss have been identified (pym et al., ). nevertheless, it remains unknown whether the translocation of substrates through the inner and outer membrane is functionally coupled or not (one- or two-step, respectively), and if it deploys a specific outer-membrane complex to do so (bunduc et al., a). proteins from the pe/ppe family, characterised by pro-glu and pro-pro-glu motifs and secreted by t ss, are often associated with the outer most layer of the mycobacterial cell .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / envelope, and have been suggested to play a role in the membrane channel formation (burggraaf et al., ; cascioferro et al., ; wang et al., ). recently, the intake of nutrients by m. tuberculosis was shown to be dependent on pe/ppe proteins, suggesting that these form small molecule–selective porins that allow the bacterium to take up nutrients over an otherwise impermeable barrier (wang et al., ). esx- to - are paralogue protein complexes with specialised functions and substrates, unable to complement each other (abdallah et al., ; phan et al., ). esx- is an essential player in the virulence of m. tuberculosis. it has been implicated in phagosomal escape, cellular inflammation, host cell death, and dissemination of the bacteria to neighbouring cells (abdallah et al., ; houben et al., a; simeone et al., ; stanley et al., ; van der wel et al., ). our knowledge about the structure of the machinery as well as the mechanism of secretion and regulation remains limited. esx- is involved in iron homeostasis (siegrist et al., ), and only recently the molecular architecture of its inner-membrane core has been determined (famelis et al., ; poweleit et al., ). the complex consists of a dimer of protomers, made of four proteins: esx-conserved component (ecc)-b, c, d(x ), and e. despite the resolution achieved in both studies, there was no obvious channel through which the proteins substrates can traverse. rosenberg and collaborators have described that one of the elements of the secretion system (eccc) forms dimers upon substrate binding, which then forms higher-order oligomers (rosenberg et al., ). this is in agreement with observations that esx- , which is involved in nutrient uptake (ates et al., ) and host cell death (abdallah et al., ), forms a hexamer (beckham et al., ; bunduc et al., b; houben et al., b). a recent structure of the esx- hexamer shows that is it stabilised by a mycosin protease (mycp) positioned in the periplasm on top of eccb (bunduc et al., b). esx- and esx- are the least characterised, where esx- is involved in dna transfer (gray et al., ) and is seen as being the ancestor of the five esx-systems (gey van pittius et al., ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / located in different positions in the genome, the esx loci contain the genes that code for the four ecc proteins, mycp, a heterodimer of esxa/b-like proteins, and one or more pe-ppe pairs. with high sequence similarity and conservation between paralogues (poweleit et al., ; van winden et al., ), one could expect the inner-membrane core of the different systems to share a similar architecture. so what makes each one of them unique? experimental data suggest that the answer lies with the substrates (lou et al., ). the esx- locus encodes for more than ten unique proteins that are known to be secreted (sani et al., ), termed the esx- secretion-associated proteins (esp) (bitter et al., ). amongst those is espc, a protein present in pathogenic organisms that was described to form filamentous structures in vitro and to localise on the surface of the bacteria in vivo (lou et al., ). due to the similarities between espc and the needle protein of the type iii secretion system, lou et al. hypothesised that esx- could be an injectosome system with espc as its needle. this is of particular importance because, compared to the other systems, esx- function has been described to take place through a contact-dependent mechanism (conrad et al., ), which makes the discovery of an outer-membrane complex essential for understanding the system. other proteins, like espe that has been localised on the cell wall (carlsson et al., ; phan et al., ; sani et al., ), are of interest as possible elements of the outer-membrane complex. the protein espb has been the focus of attention due to its ability to oligomerise upon secretion (korotkova et al., ; solomonson et al., ), making it a strong candidate as a structural component of the machinery (piton et al., ). espb belongs to the pe/ppe family, but unlike other family members that form heterodimers in mycobacteria, espb consists of a single poly-peptide chain fusing the pe and ppe domains (korotkova et al., ). espb is a -kda protein that matures during secretion: its largely unstructured c-terminal region is cleaved in the periplasm by the protease mycp , leaving a mature -kda isoform (ohol et al., ; solomonson et al., ; xu et al., ). the purpose of this maturation is not yet clear but it was shown that inactivation of mycp , and thus cleavage of espb, deregulates the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / secretion of proteins by esx- (ohol et al., ). chen et al. observed specific binding of espb to phosphatidylserine and phosphatidic acid after cleavage (chen et al., ), suggesting that the c- terminal processing of espb is important for its functioning, possibly involving lipid binding. the crystal structure of the monomeric n-terminal part of espb from m. tuberculosis and m. smegmatis has been determined: it forms a four-helix bundle with high structural homology between species (korotkova et al., ; solomonson et al., ). during the preparation of this work for publication, the structure of an espb oligomer from m. tuberculosis was published by piton et al., showing features of a pore-like transport protein (piton et al., ). espb is the only member of the pe/ppe family described to date to form higher-order oligomers. in this work, we studied the oligomerisation ability and structures of espb from m. tuberculosis, m. marinum, m. haemophilum and m. smegmatis. we show that truncation of espb at the mycp cleavage site and an acidic environment promote the oligomerisation of espb from the three pathogenic species but not from non-pathogenic m. smegmatis. oligomerisation is mediated by intermolecular hydrogen bonds and amide bridges between residues highly conserved in the pathogenic species, but absent in m. smegmatis. the structures of oligomeric espb consist of two domains: an n-terminal region that forms a cylinder-like structure with a tunnel large enough to accommodate a folded pe-ppe pair, and a partly hydrophobic c-terminal region that interact with hydrophobic surfaces. the oligomer has similar inner-pore dimensions as was described for the pore within the periplasmic region of esx- (bunduc et al., b). visualisation of a trapped espb monomer within the channel supports the idea that it could transit secreted proteins through its tunnel. overall, in this work we describe factors that prime the oligomerisation of espb, and provide insight into its potential role in the esx- machinery. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / results oligomerisation is favoured by an acidic ph and maturation of espb previously, it has been described that oligomerisation of espb occurs after secretion (korotkova et al., ). in the infection context, this secretion would lead the protein to the phagosomal lumen of a macrophage, an organelle known to have ph acidification as a functional mechanism. to evaluate the putative role of ph in the oligomerisation process, the mature form of m. tuberculosis espb (residues – ) was incubated at different ph values and analysed by size exclusion chromatography (sec). results showed that the equilibrium is favoured towards an oligomer form at ph . compared to ph . (fig a), as observed by a higher oligomer/monomer ratio at any protein concentration (fig b). native mass spectrometry experiments confirmed this behaviour and could identify different oligomeric states of espb, with the heptamer being the most predominant. intermediate states were observed (dimer to pentamer) and even higher oligomeric states (octamer) but in lower abundance compared to the heptamer (fig c and d). because espb undergoes proteolytic processing of its c-terminus during secretion, we investigated the effect of this cleavage on the quaternary structure of different espb constructs, varying in their c-terminus lengths, from m. tuberculosis at ph . (fig ). with the exception of espb - that did not oligomerise, we observed that oligomerisation was favoured for all other constructs at ph . (fig b). the full-length espb - (fig b, blue trace) presented the lowest amounts of complex formation compared to the other constructs tested, while the highest amount was observed for the mature isoform, espb - (fig b, orange trace). these results suggest that mycp cleaves espb to allow oligomerisation, and that the remaining residues of the unstructured c-terminal region are needed, possibly, to stabilise the complex. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / espb from m. smegmatis is unable to oligomerise to determine whether the oligomerisation ability is conserved across species, we performed cryo-em analysis on different orthologues of the mature espb. proteins from the pathogenic species m. tuberculosis, m. marinum and m. haemophilum were able to oligomerise into ring-like structures while the non-pathogenic m. smegmatis did not, as seen by the lack of visible particles (fig a); the structured region of an espb monomer ( kda) has a signal-to-noise ratio too low to be visualised within these micrographs (henderson, ; zhang et al., ). interestingly, comparison of the tertiary structure from the pathogenic species studied here with the published crystallographic model of espb from m. smegmatis did not show substantial differences (rmsd cα’s . – . Å), apart from an extended α-helix (fig b), absent in our oligomeric structures. to determine whether the differences in oligomerisation ability between m. smegmatis and the pathogenic species were due to their primary structure variances, we performed sequence alignment of multiple espb orthologues. the species that presented oligomerisation showed high sequence identity whereas m. smegmatis has the lowest of all (fig c and fig ev ). because espb belongs to the pe/ppe family, we included in the analysis a pe-ppe pair with a structure already published (ekiert & cox, ). pe -ppe did not oligomerise (fig ), despite sharing a similar tertiary structure (rmsd cα’s . Å). with a low identity percentage ( . %), it confirms the importance of specific amino acids sequence for the conservation of the quaternary structure. high-resolution cryo-em structures of espb oligomers next, we aimed to solve the high-resolution structure of espb oligomers by cryo-em. initial experiments were performed with espb - and espb - from m. tuberculosis, which displayed a very strong preferential orientation where only “top views” could be seen (fig a). cryo-electron tomography revealed these molecules to be attached to the air-water interface (fig ev ). different .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / oligomers were found: hexamers, heptamers, rings with an extra density in the middle, and octamers, with the heptameric ensemble being the predominant one (fig a), in agreement with the results obtained in solution (fig c). preliminary d reconstructions could be obtained from data that were collected at one or more tilt angles (fig b). data processing resulted in – Å resolution maps from which the first heptamer models were built. removal of c-terminal ending up at residue led to a different distribution of particles on cryo-em grids, now with random orientations (fig c), implying that this region interacts with the air-water interface on the em grid (noble et al., ) (fig ev ). experiments were repeated for constructs espb - from m. tuberculosis and the equivalent construct from m. marinum (at °-stage tilt), leading to high-resolution em maps of . Å and . Å average resolution, respectively (fig a and b, and fig ev a–c). we observed high structural conservation between the two structures. both displayed a four-helix bundle, like the esxa-b complex and pe -ppe complex, with the wxg and yxxxd located on one end of the elongated molecule, referred to the top hereafter, making an h-bond interaction between the nitrogen of w with the oxygen of y , as was observed in the crystal structure (fig c) (korotkova et al., ). the helical tip is located on the opposite end, referred to as the bottom, for both espb and pe -ppe (korotkova et al., ; solomonson et al., ). the c-terminal region starts near the top end of the elongated molecule. the overall structure shows seven copies tilted ° with respect to the symmetry-axes forming a cylinder-like oligomer with a width and a height of Å (fig a and b). the single particle analysis (spa) map from m. tuberculosis espb revealed three q-q and q-n interaction pairs between monomers (fig c). q was conserved in all espb orthologues analysed here with exception of m. smegmatis that showed no oligomerisation (fig c and fig ev ). a q a substitution in the m. tuberculosis orthologue resulted in the disruption of the oligomer (fig ev d, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / yellow trace) as evidenced by the absence of a high molecular weight peak by sec. amide bridges are not commonly seen within the structures present in the protein data bank (pdb) (joosten et al., ) but they are stronger interactions than typical hydrogen bonds and less affected by ph changes compared to salt bridges, another strong interaction (xie et al., ). in addition to the q-q and q-n interaction pairs, some hydrophobic interfacing residues were identified, including l and l . histidines, glutamates and aspartates were not found to interact directly with the neighbouring monomer. it seems that these residues mainly play a role in the ph-dependent overall charge distribution of the monomers (fig a-b). our high-resolution espb oligomer maps did not reveal a continuous density for the pe-ppe linker. the proposed location of the linker within the crystal structure (korotkova et al., ) overlaps with the oligomerisation interface, and would need to adopt a different position upon oligomerisation. particle subtraction followed by focused classification showed partial densities for the linker at the periphery of the structure. we locked the pe-ppe linker in its crystal-structure position by making a double mutation, n c (in the core of the monomer) and t c (in the pe-ppe linker): this would prevent the linker to adopt a different conformation as is needed for the oligomerisation. this double mutant abolished oligomerisation of espb, suggesting that an intramolecular bond was formed that prevented the linker from moving (fig ev d, red trace). espb, a possible transport channel for t ss proteins the espb cylinder-like structure has an internal pore diameter of Å (fig a-b), large enough to accommodate folded proteins such as esxa/esxb (diameter Å), pe -ppe (diameter Å) or an espb monomer itself (diameter Å). analysis of the degree of hydrophobicity in the structure showed that the internal surface of the oligomer is mainly hydrophilic (fig d), allowing other hydrophilic molecules to pass. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / during the cryo-em data processing, additional densities were consistently found within the espb heptamer of all the different constructs, including constructs that lack the c-terminal region. fig e shows a high-resolution d class of espb - with a well-defined density inside the channel from a subset of the data collected at °-stage tilt. this d class was found in ~ % of the particles recorded. the d classes obtained at °-stage tilt could not be unambiguously manually assigned to specific oligomerisation forms. instead, d classification in relion (scheres, ) was used to identify one class with solely c symmetry and one class with an extra density within the heptameric channel. local symmetry averaging of the heptamer model while processing the overall map in c map revealed an extra density spanning the entire channel, in which we could fit an espb monomer model (fig g-h). integrity of the pe-ppe linker is not essential for the oligomerisation of espb to determine if the pe-ppe linker absent in our model was essential for oligomerisation, we performed limited proteolysis analysis on the m. tuberculosis constructs. incubation with trypsin fully digested espb - , perhaps due to its lower stability, but resulted in two major fragments for the constructs espb - and espb - , as shown by sds-page (fig ev a). n-terminal sequencing and mass spectrometry analysis revealed that the larger fragment corresponded to a section of the protein comprising residues v to r (corresponding to the ppe domain), while the smaller fragment included the n-terminal end of the protein sequence, with a few residues from the affinity tag, up to residue r (pe domain and linker)(fig ev b–d). despite being split within the pe-ppe linker into two fragments, espb - behaved in gel filtration as a single entity with the capacity to form oligomers (fig ev e and f) confirming that the integrity of this region is not necessary for the complex to form. it is noteworthy that trypsin did not cut before r , even though there are cleavage sites in the so-called unfolded c-terminal region rising the question if this region is actually fully unstructured. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / properties of the espb c-terminal region the function of the c-terminal region has puzzled the scientific community for a long time, partly because it is the only substrate known to date of the mycp protease. here, we described its processing as an important factor for the oligomerisation of the n-terminal region ( - ), however, the cleavage leaves ~ residues for no-obvious reason. to gain insight in the properties of the c- terminal region that could hint for its function, and based on the preferential orientation-effect seen in cryo-em, we performed a hydrophobicity analysis of this region on the different espb orthologues. analysis evidenced the presence of hydrophobic patches in the pathogenic species, that are absent in espb from m. smegmatis (fig ev a). some of these patches are present in all the constructs with preferential orientation, leading to speculate that residues – interact with the air-water interface of the cryo-em grid (noble et al., ). to understand whether this effect is related to a structural change or particular characteristic in the c-terminal region of the protein, we expressed a construct corresponding to residues – and carried out circular dichroism (cd) studies on it. far uv cd spectra analysis of this region showed a negative band around nm (fig ev b), characteristic of random coil structures. this result is in line with the high fraction ( %) of “disorder-promoting” residues within this region (lysine, glutamine, serine, glutamic acid, proline and glycine: amino acids commonly found in intrinsically disordered protein regions). interestingly, its proline content is . times higher than that observed for proteins in the pbd (theillet et al., ; uversky, ). comparative analysis of the cd difference spectra obtained at different ph [Δ] (ph . – ph . )] revealed a positive signal close to nm and a negative signal near nm (fig ev b inset), showing that this region is able to adopt extended left- handed helical conformations [poly-l-proline type ii or ppii (rucker & creamer, )]. cd analysis of the c-terminal region in the presence of different concentrations of , , -trifluoroethanol (tfe) showed that this region has an intrinsic ability to attain helicity based on the decrease in the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ellipticity signal at nm (fig ev c) (luo & baldwin, ). the lack of a single isodichroic point at nm suggests that the conformational changes elicited by tfe do not comply with a two-state model and that most probably the transition is accompanied by an intermediate, e.g. the presence of more than one α-helix. discussion in the present study, we describe different factors that facilitate oligomerisation of espb: an acidic environment, the truncation of its c-terminal region, a flexible pe-ppe linker and the residues involved in the interaction. our findings are in agreement with previous observations that espb oligomerises upon secretion (korotkova et al., ). based on these results, the c-terminus of the full-length protein could prevent premature oligomerisation in the cytosol of mycobacteria, possibly through steric hindrance. however, this region is also likely to have other functions. deletion of espb c-terminus does not affect its own secretion (mclaughlin et al., ; xu et al., ) but rather the secretion of esxa/esxb, possibly by loss of interaction with the last residues of espb (xu et al., ). the sequence of the c-terminal end is highly conserved (fig ev ), which makes it possible that this region interacts with other molecules in the cytoplasm of the bacterium. this ability of espb to oligomerise seems to be conserved across mycobacterial species, with the exception of m. smegmatis. this microorganism is a fast-growing, non-pathogenic species that uses esx- system for horizontal dna transfer (flint et al., ). the exact mechanism of this transfer is unknown; however, evidence suggests that esx- is not the dna conduit but rather secretes proteins that act like pheromones, which in turn induces the expression of esx- genes resulting in mating- pair interactions (gray et al., ). the esx- substrate esxa was shown to undergo a structural change that allows membrane insertion in m. tuberculosis when exposed to an acidic environment, however, this effect does not occur in its m. smegmatis orthologue (de leon et al., ; ma et al., .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ). taking the aforementioned antecedents and the oligomerisation differences between espb proteins observed in this work, it is plausible to think that the mechanism of action of the esx- is distinct between these two species. espb interacts with the lipids phosphatidylserine and phosphatidic acid (chen et al., ). it was suggested that espb could transport phosphatidic acid (piton et al., ) but the interior of the complex is mainly hydrophilic, making this scenario less plausible. despite the presence of lipids in the crystallisation set up, korotkova et al. ( ) could not find lipids within the crystal structure of espb - which lacks the c-terminus. our results show that the c-terminus of espb contributes to the protein’s preferred orientation on an em grid caused by an interaction to the hydrophobic air-water interface (noble et al., ), analogous to what could happen on a lipid membrane. with a ppii helix at the end of the channel followed by hydrophobic patches at the c-terminus, we hypothesise that this secondary structure interacts with the head group of the lipids, as it has been described for other ppii (franz et al., ), allowing the hydrophobic residues to insert into bilayer membranes. based on the chemical properties of the channel and supported by the evidence of an extra espb monomer observed within the oligomer, we propose that espb could be a structural element of esx- allowing other substrates to transit through the channel. the combined data presented in our work leads us to hypothesise three models of the role of the espb oligomer. espb within the cytosol is likely to be monomeric (korotkova et al., ), either free or chaperoned by espk (mclaughlin et al., ). binding of a chaperone to the helical tip of espb would place the wxg and yxxxd bipartite secretion signal exposed on the top of espb, ready to interact with the t ss machinery. upon exiting esx- inner-membrane pore, the pre-protein espb will be cleaved within the periplasm by mycp . analogous to esx- (bunduc et al., b), we expect mycp to cap the central periplasmic dome-like chamber formed by eccb , and to have its proteolytic .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / site faced towards its central pore. the cleavage of the c-terminal region at a (solomonson et al., ) will remove the most hydrophilic part of the c-terminus leaving a hydrophobic tail (fig ev a). from here, we propose three possible pathways for the oligomerisation of espb. in one scenario, after processing of the c-terminus, espb binds the outer membrane of mycobacteria increasing its critical concentration to form an oligomer (fig model ). as suggested above, it can be assumed that espb monomers would transit through the inner-membrane pore with the top first where the c- terminus as well as the wxg and yxxxd motifs are located. in this position, the monomers would already be properly oriented to form an oligomer on the outer membrane inner leaflet just like espb - attaches to the air-water interface of a cryo-em grid (fig a). the inner pore of the espb heptamer has similar dimensions compared to that proposed by beckham et al. for the esx- hexameric structure (beckham et al., ), albeit they later published a higher resolution structure with a more constricted pore in a close state (beckham et al., ). the space between the inner and outer-membrane has been reported to be – nm wide (dulberger et al., ; sani et al., ; zuber et al., ), which could accommodate the -nm long espb heptamer. it was postulated (piton et al., ) that the positively charged interior of the espb channel could play a role in the transfer of negatively charged substrates such as dna or phospholipids. however, in analogy to the negative lumen of a bacteriophage tail that is used to transfer dna (zinke et al., ), we propose that the positively charged interior space of the espb oligomer would channel substrates of the same charge, as negatively charged substrates would most likely bind and get trapped. since the heptameric structure presented here lacks any trans-membrane domains and is highly soluble, it is unlikely to be embedded within the outer membrane but could be anchored by its c-terminus forming part of a larger machinery that completes the esx- core complex. espb is well known to be secreted to the culture medium of mycobacteria (lodes et al., ), thus in this model espb will help in its own secretion by forming a channel through which additional substrates like espb itself, could travel. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / in a second scenario, after espb got secreted outside the bacterium, it could interact with either the phagosomal membrane or the external face of the outer membrane (fig model ). the aforementioned hypothesis of how espb is secreted (c-terminus and wxg/yxxd motifs first) would favour interaction with the phagosomal membrane; however, there is also some evidence of espb being extracted from outer-most layer of the bacterium (sani et al., ). from different experiments (fig ), it is expected that oligomerisation is concentration dependent. interaction with protein structures or a membrane could increase the local concentration, making the system more efficient. in the third more speculative model, espb would undergo a conformational change, as observed for some pore-forming proteins such as the amphitropic gasdermins (liu & lieberman, ). upon proteolysis, a pre-pore ring could assemble prior to membrane insertion (ruan et al., ). recently, it was suggested that the pe/ppe family of proteins could form small molecule- selective channels analogous to outer-membrane porins, allowing m. tuberculosis to take up nutrients through its almost impermeable cell wall (wang et al., ). despite evidence, it remains a mystery how such soluble heterodimers would insert into a membrane. we hypothesise that, analogous to the heterodimer esxa/esxb where esxa alone can insert into a membrane in acidic conditions (de leon et al., ), the amphiphilic helices of either pe or ppe alone might insert into the membrane. espb is fundamentally different from pe/ppe pairs in the sense that its pe and ppe parts are fused into a single protein, joined by one long flexible linker able to adopt multiple conformations (piton et al., ). unlike the esxa-esxb heterodimer, where esxa would act independently from esxb upon membrane insertion, the pe moiety of espb would still be linked to its ppe counterpart even if the latter inserts would itself into a membrane. we speculate that such linker could allow espb to form tubular-like structures while exchanging pe and ppe domains between different molecules (fig model ). such higher-order oligomers, as described for espc (lou .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / et al., ) and occasionally also found in our data (fig ev ), could be a component of the secretion apparatus. our hypothesis that espb acts as a scaffold or structural component of the secretion apparatus is supported by earlier findings. as esx- work through a contact-dependent mechanism and not by secretion of toxins (conrad et al., ), it is possible that the cytotoxic effects on macrophages observed by chen et al. for espb was the result of an increment in the machinery activity (chen et al., ). most of the work described here favours model or . more evidence needs to be gathered to falsify or verify any of the models. techniques like in situ cryo-electron tomography of infected immune cells could be used to provide visual insight. in summary, this study reveals factor that prime the oligomerisation of espb and presents evidence that supports the hypothesis that espb is a structural element of esx- secretion system, possibly acting on a lipid membrane. esx- is a major player in the virulence of mycobacterial species, like m. tuberculosis. however, after decades of arduous research, our understanding on the structure and the mechanism of action of this system remains limited. here we provide a structural and possibly functional understanding of an esx- element. full understanding of all the esx- components and structural states could guide structural-based drug and vaccine design in order to tackle the global health threat that tuberculosis is. materials and methods cloning, expression and purification of espb constructs different constructs used in this study are listed in s table. dna fragments were pcr-amplified with kod hot start master mix (novagen®) from genomic dna of m. tuberculosis h rv, m. marinum or m. smegmatis [bei resources, national institute of allergy and infectious diseases (niaid)], and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cloned in a modified prset backbone (invitrogen™) using nsii and hindiii restriction sites. constructs included an n-terminal ×his-tag followed by a tobacco etch virus (tev) protease cleavage site. espb mutants and construct espb - and espb - were generated using kod-plus- mutagenesis kit (toyobo co., ltd.) from the plasmid encoding the full-length protein. all plasmids were sequenced to verify absence of inadvertent mutations. m. haemophilum and pe -ppe construct were synthesised and codon optimised for expression in escherichia coli (eurofins genomics). for the non–codon optimised constructs, proteins were expressed in rosetta (de ) e. coli cells in overnight express™ instant lb medium (emd millipore) supplemented with μg/ml of carbenicillin and μg/ml of chloramphenicol for h at °c. in the case of codon optimisation, the protein was expressed in c (de ) e. coli cells in the same conditions with the respective antibiotic. prior to protein purification, cells were resuspended in buffer containing mm tris-hcl (ph . ), mm nacl, mm pmsf, and u/ml benzonase, and were lysed using an emulsiflex-c homogenizer (avestin). proteins were purified with hispur™ ni-nta resin (thermofisher) equilibrated in the lysis buffer and eluted in the same buffer supplemented with mm imidazole. the ×his-tag was cleaved using tev protease followed by a second ni-nta purification to remove the free xhis-tag, uncleaved protein and the his-tagged protease (kapust et al., ). in case higher purity was needed, proteins were purified on a size-exclusion superdex increase / gl column (ge healthcare) in buffer containing mm tris-hcl (ph . ), mm nacl. protein was stored at - °c until further use. analytical size exclusion chromatography (sec) samples were dialysed overnight in the corresponding buffers and different concentrations of protein were loaded onto a size-exclusion superdex increase . / column (ge healthcare life science) at a flow rate of µl/min. basic buffer comprised mm tris-hcl (ph . ), mm nacl, while the acidic buffer was mm acetate buffer (ph . ), mm nacl. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / native mass spectrometry native mass spectrometry was used to obtain the high resolution mass information of the samples. m. tuberculosis espb - ( mg/ml) was buffer exchanged with mm nh ch co (at ph . and . ) using -kda molecular weight cut-off dialysis membrane overnight followed by an extra hour buffer exchange with a fresh nh ch co solution at °c. the buffer exchange of fragments produced by limited proteolysis ( mg/ml) was performed using sec on a superdex increase . / column (ge healthcare life science) with mm nh ch co at ph . . ch cooh and nh oh were used to adjust the ph of nh ch co solution. the mass spectrometry measurements were performed in positive ion mode on an ultra-high mass range (uhmr) q-exactive orbitrap mass spectrometer (thermo fisher scientific) with a static nano-electrospray ionization (nesi) source. in- house pulled, gold-coated borosilicate capillaries were used for the sample introduction to the mass spectrometer, and a voltage of . kv was applied. mass spectral resolution was set at , to , (at m/z= ) and an injection time of to ms was used. for each spectrum, scans were combined, containing to microscans. the inlet capillary temperature was kept at °c. parameters such as in-source trapping, transfer m/z, detector m/z, trapping gas pressure and mass range were optimized for each analyte separately. all mass spectra were analysed using thermo scientific xcalibur software and spectral deconvolutions were performed with the unidec software (marty et al., ). cryo-em sample preparation, data acquisition and image processing samples, in mm acetate buffer (ph . ), mm nacl, were diluted to the respective concentrations (table ). a volume of . μl of each sample was applied on glow-discharged ultraufoil au r . / . grids (quantifoil), and excess liquid was removed by blotting for s (blot force ) using filter paper followed by plunge freezing in liquid ethane using a fei vitrobot mark iv at .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / % humidity at °c. for pe -ppe , an acetate buffer (ph . ), mm nacl was used, due to precipitation of the protein at lower ph. cryo-em single particle analysis (spa) data were collected using untilted and tilted schemes (tan et al., ). for espb - from m. tuberculosis and espb - from m. marinum, untilted images were recorded on a titan krios at kv with a k detector operated in super-resolution counting mode. tilted spa data were collected for espb - from m. tuberculosis on a -kv tecnai arctica tem using serialem (mastronarde, ), using a falcon iii detector in counting mode. table shows all specifications and statistics for the data sets. individual micrographs of espb - from m. haemophilum, espb - from m. smegmatis as well as pe -ppe from m. tuberculosis were collected on the -kv arctica. data were processed using the relion- pipeline (zivanov et al., ). movie stacks were corrected for drift ( × patches) and dose-weighted using motioncor (zheng et al., ). the local contrast transfer function (ctf) parameters were determined for the drift-corrected micrographs using gctf (zhang, ). the espb - data set was collected at two angles of the stage: degrees and degrees. for each tilt angle, a first set of d references were generated from manually picked particles in relion (scheres, ) and these were used for subsequent automatic particle picking. table lists the number of particles in the final data set after particle picking, d classification and d classification. the d classification was run without imposing symmetry and used to select the heptameric particles. local ctf parameters were iteratively refined (zivanov et al., ) , which was particularly important for the tilted data set, beamtilt parameters were estimated and particles were polished. particle subtraction followed by focused classification was used to characterise densities other than that described by the refined model described below. due to extreme preferred orientation of the datasets of espb - , automatic masking and automatic b-factor estimation in post-processing were hampered by missing wedge artefacts. for this data set, parameters were .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / manually optimised by visual inspection of the resulting maps. density within the heptameric pore was obtained by a combination of d and d classification. the initial density map of a loaded complex was generated by symmetry expansion of a c d-refined particle list, followed by d classification in c without further image alignment. later iterations employed unique particles and d refinement in c while imposing local symmetry for the heptamer. the resulting . Å map was used to identify a total of eight espb monomers (heptamer plus one in the middle), and local symmetry averaged. the final resolution of the heptamer maps, listed in table , varied between . and . Å, using the gold-standard fsc= . criterion (scheres & chen, ). structure determination and refinement the pdb model xxx (korotkova et al., ) was used as a starting model in coot (emsley & cowtan, ) for manual docking and building into the tilted-scheme spa data set of espb - of m. tuberculosis. the final model was refined against the high-resolution sharpened map of espb - of m. tuberculosis. this model was later used as reference for m. marinum model. models were refined iteratively through rounds of manual adjustment in coot (emsley et al., ), real space refinement in phenix (afonine et al., ) and structure validation using molprobity (williams et al., ). limited proteolysis and edman sequencing samples were incubated with trypsin for different length of time at a molar ratio of : (enzyme:substrate) following the proti-ace™ kit (hampton research) recommendation. the reaction was stopped by adding sds-page loading buffer ( mm tris-hcl, % sds, % glycerol, . % bromophenol blue) and samples were resolved on a % polyacrylamide gel. bands were transferred from the sds-page gel to a pvdf membrane and stained with . % coomassie brilliant blue r- , % methanol, and % acetic acid until bands were visible. the membrane was then washed with water and dried, and espb cleavage products were cut out. the first ten amino acids were .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / determined by edman sequencing at the plateforme protéomique pissaro irib at the université de rouen. circular dichroism spectroscopy the cd spectra of µm espb - were recorded either in mm phosphate (ph . ), mm nacl or mm acetate (ph . ), mm nacl at °c in the far-uv region using a jasco j- cd spectropolarimeter (jasco analytical instruments) on a . cm path-length cell. spectra correspond to the average of five repetitive scans acquired every nm with -s average time per point and -nm band pass. temperature was regulated with a peltier temperature-controlled cell holder. data were corrected by subtracting the cd signal of the buffer over the same wavelength region. the effect of , , -trifluoroethanol (tfe) was recorded using the aforementioned phosphate buffer. secondary structure content was estimated by deconvolution using the program bestsel (micsonai et al., ). data availability the final maps as well as the half-maps and masks will be deposited in empiar. the refined m. tuberculosis and m. marinum will be deposited within the protein data bank. acknowledgments we thank paul van schayck (um) for indispensable serialem and it support; the microscopy core lab (um) for their technical and scientific support; yue zhang (um) for help in model refinement; chris lewis (um) for help with the tomograms; laurent coquet (université de rouen, france) for edman sequencing; florence pojer and stewart cole (global health institute, lausanne, switzerland) for initial sample aliquots and preliminary studies; ron heeren and shane ellis (um) for native mass spectrometry support; and hang nguyen (um) for critical reading of the manuscript. this research received funding from the netherlands organisation for scientific research (nwo) in the framework .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / of the fund new chemical innovations, numbers . . and . . , from the european union’s horizon research and innovation programme under grant agreement no q- sort. this research is also part of the m i research programme supported by the dutch province of limburg through the link programme. author contributions ag and rbgr designed the study and wrote the manuscript. ag, vv, yg, gt, as, nsp and am performed the experiments. ag, gt, nsp and rbgr analysed the data. clp, pjp and rbgr supervised the project. all authors read and approved the final manuscript. declaration of interests the authors declare no competing interests. references abdallah am, bestebroer j, savage nd, de punder k, van zon m, wilson l, korbee cj, van der sar am, ottenhoff th, van der wel nn et al ( ) mycobacterial secretion systems esx- and esx- play distinct roles in host cell death and inflammasome activation. j immunol : - abdallah am, gey van pittius nc, champion pa, cox j, luirink j, vandenbroucke-grauls cm, appelmelk bj, bitter w ( ) type vii secretion--mycobacteria show the way. nat rev microbiol : - afonine pv, klaholz bp, moriarty nw, poon bk, sobolev ov, terwilliger tc, adams pd, urzhumtsev a ( ) new tools for the analysis and validation of cryo-em maps and atomic models. acta crystallographica section d : - ates ls, ummels r, commandeur s, van de weerd r, sparrius m, weerdenburg e, alber m, kalscheuer r, piersma sr, abdallah am et al ( ) essential role of the esx- secretion system in outer membrane permeability of pathogenic mycobacteria. plos genet : e baker na, sept d, joseph s, holst mj, mccammon ja ( ) electrostatics of nanosystems: application to microtubules and the ribosome. proc natl acad sci u s a : - beckham ks, ciccarelli l, bunduc cm, mertens hd, ummels r, lugmayr w, mayr j, rettel m, savitski mm, svergun di et al ( ) structure of the mycobacterial esx- type vii secretion system membrane complex by single-particle analysis. nat microbiol : .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / beckham ks, ritter c, chojnowski g, mullapudi e, rettel m, savitski mm, mortensen sa, kosinski j, wilmanns m ( ) structure of the mycobacterial esx- type vii secretion system hexameric pore complex. biorxiv bitter w, houben en, bottai d, brodin p, brown ej, cox js, derbyshire k, fortune sm, gao ly, liu j et al ( ) systematic genetic nomenclature for type vii secretion systems. plos pathog : e brennan pj, nikaido h ( ) the envelope of mycobacteria. annu rev biochem : - bunduc cm, bitter w, houben eng ( a) structure and function of the mycobacterial type vii secretion systems. annu rev microbiol bunduc cm, fahrenkamp d, wald j, ummels r, bitter w, houben en, marlovits tc ( b) structure and dynamics of the esx- type vii secretion system of mycobacterium tuberculosis. biorxiv burggraaf mj, ates ls, speer a, van der kuij k, kuijl c, bitter w ( ) optimization of secretion and surface localization of heterologous ova protein in mycobacteria by using lipy as a carrier. microb cell fact : carlsson f, joshi sa, rangell l, brown ej ( ) polar localization of virulence-related esx- secretion in mycobacteria. plos pathog : e cascioferro a, delogu g, colone m, sali m, stringaro a, arancia g, fadda g, palu g, manganelli r ( ) pe is a functional domain responsible for protein translocation and localization on mycobacterial cell wall. mol microbiol : - chen jm, zhang m, rybniker j, boy-rottger s, dhar n, pojer f, cole st ( ) mycobacterium tuberculosis espb binds phospholipids and mediates esxa-independent virulence. mol microbiol : - conrad wh, osman mm, shanahan jk, chu f, takaki kk, cameron j, hopkinson-woolley d, brosch r, ramakrishnan l ( ) mycobacterial esx- secretion system mediates host cell lysis through bacterium contact-dependent gross membrane disruptions. proc natl acad sci u s a : - de leon j, jiang g, ma y, rubin e, fortune s, sun j ( ) mycobacterium tuberculosis esat- exhibits a unique membrane-interacting activity that is not found in its ortholog from non-pathogenic mycobacterium smegmatis. j biol chem : - dolinsky tj, nielsen je, mccammon ja, baker na ( ) pdb pqr: an automated pipeline for the setup of poisson-boltzmann electrostatics calculations. nucleic acids res : w - dulberger cl, rubin ej, boutte cc ( ) the mycobacterial cell envelope - a moving target. nat rev microbiol : - ekiert dc, cox js ( ) structure of a pe-ppe-espg complex from mycobacterium tuberculosis reveals molecular specificity of esx protein secretion. proc natl acad sci u s a : - emsley p, cowtan k ( ) coot: model-building tools for molecular graphics. acta crystallogr d biol crystallogr : - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / emsley p, lohkamp b, scott wg, cowtan k ( ) features and development of coot. acta crystallographica section d : - famelis n, rivera-calzada a, degliesposti g, wingender m, mietrach n, skehel jm, fernandez- leiro r, bottcher b, schlosser a, llorca o et al ( ) architecture of the mycobacterial type vii secretion system. nature ferluga j, yasmin h, al-ahdal mn, bhakta s, kishore u ( ) natural and trained innate immunity against mycobacterium tuberculosis. immunobiology : flint jl, kowalski jc, karnati pk, derbyshire km ( ) the rd virulence locus of mycobacterium tuberculosis regulates dna transfer in mycobacterium smegmatis. proc natl acad sci u s a : - franz j, lelle m, peneva k, bonn m, weidner t ( ) sap(e) - a cell-penetrating polyproline helix at lipid interfaces. biochim biophys acta : - gey van pittius nc, gamieldien j, hide w, brown gd, siezen rj, beyers ad ( ) the esat- gene cluster of mycobacterium tuberculosis and other high g+c gram-positive bacteria. genome biol : research goddard td, huang cc, meng ec, pettersen ef, couch gs, morris jh, ferrin te ( ) ucsf chimerax: meeting modern challenges in visualization and analysis. protein sci : - gray ta, clark rr, boucher n, lapierre p, smith c, derbyshire km ( ) intercellular communication and conjugation are mediated by esx secretion systems in mycobacteria. science : - henderson r ( ) the potential and limitations of neutrons, electrons and x-rays for atomic resolution microscopy of unstained biological molecules. q rev biophys : - houben d, demangel c, van ingen j, perez j, baldeon l, abdallah am, caleechurn l, bottai d, van zon m, de punder k et al ( a) esx- -mediated translocation to the cytosol controls virulence of mycobacteria. cell microbiol : - houben en, bestebroer j, ummels r, wilson l, piersma sr, jimenez cr, ottenhoff th, luirink j, bitter w ( b) composition of the type vii secretion system membrane complex. mol microbiol : - joosten rp, salzemann j, bloch v, stockinger h, berglund ac, blanchet c, bongcam-rudloff e, combet c, da costa al, deleage g et al ( ) pdb_redo: automated re-refinement of x-ray structure models in the pdb. j appl crystallogr : - kapust rb, tozser j, fox jd, anderson de, cherry s, copeland td, waugh ds ( ) tobacco etch virus protease: mechanism of autolysis and rational design of stable mutants with wild-type catalytic proficiency. protein eng : - korotkova n, freire d, phan th, ummels r, creekmore cc, evans tj, wilmanns m, bitter w, parret ah, houben en et al ( ) structure of the mycobacterium tuberculosis type vii secretion system chaperone espg in complex with pe -ppe dimer. mol microbiol : - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / korotkova n, piton j, wagner jm, boy-rottger s, japaridze a, evans tj, cole st, pojer f, korotkov kv ( ) structure of espb, a secreted substrate of the esx- secretion system of mycobacterium tuberculosis. j struct biol : - larkin ma, blackshields g, brown np, chenna r, mcgettigan pa, mcwilliam h, valentin f, wallace im, wilm a, lopez r et al ( ) clustal w and clustal x version . . bioinformatics : - liu x, lieberman j ( ) knocking 'em dead: pore-forming proteins in immune defense. annu rev immunol : - lodes mj, dillon dc, mohamath r, day ch, benson dr, reynolds ld, mcneill p, sampaio dp, skeiky ya, badaro r et al ( ) serological expression cloning and immunological evaluation of mtb , a novel mycobacterium tuberculosis antigen. j clin microbiol : - lou y, rybniker j, sala c, cole st ( ) espc forms a filamentous structure in the cell envelope of mycobacterium tuberculosis and impacts esx- secretion. mol microbiol : - luo p, baldwin rl ( ) mechanism of helix induction by trifluoroethanol: a framework for extrapolating the helix-forming properties of peptides from trifluoroethanol/water mixtures back to water. biochemistry : - ma y, keil v, sun j ( ) characterization of mycobacterium tuberculosis esxa membrane insertion: roles of n- and c-terminal flexible arms and central helix-turn-helix motif. j biol chem : - marty mt, baldwin aj, marklund eg, hochberg gk, benesch jl, robinson cv ( ) bayesian deconvolution of mass and ion mobility spectra: from binary interactions to polydisperse ensembles. anal chem : - mastronarde dn ( ) automated electron microscope tomography using robust prediction of specimen movements. j struct biol : - mclaughlin b, chon js, macgurn ja, carlsson f, cheng tl, cox js, brown ej ( ) a mycobacterium esx- -secreted virulence factor with unique requirements for export. plos pathog : e micsonai a, wien f, bulyaki e, kun j, moussong e, lee yh, goto y, refregiers m, kardos j ( ) bestsel: a web server for accurate protein secondary structure prediction and fold recognition from the circular dichroism spectra. nucleic acids res : w -w noble aj, wei h, dandey vp, zhang z, tan yz, potter cs, carragher b ( ) reducing effects of particle adsorption to the air-water interface in cryo-em. nat methods : - ohol ym, goetz dh, chan k, shiloh mu, craik cs, cox js ( ) mycobacterium tuberculosis mycp protease plays a dual role in regulation of esx- secretion and virulence. cell host microbe : - olsson mh, sondergaard cr, rostkowski m, jensen jh ( ) propka : consistent treatment of internal and surface residues in empirical pka predictions. j chem theory comput : - phan th, ummels r, bitter w, houben en ( ) identification of a substrate domain that determines system specificity in mycobacterial type vii secretion systems. sci rep : .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / phan th, van leeuwen lm, kuijl c, ummels r, van stempvoort g, rubio-canalejas a, piersma sr, jimenez cr, van der sar am, houben eng et al ( ) esph is a hypervirulence factor for mycobacterium marinum and essential for the secretion of the esx- substrates espe and espf. plos pathog : e piton j, pojer f, wakatsuki s, gati c, cole st ( ) high resolution cryoem structure of the ring- shaped virulence factor espb from mycobacterium tuberculosis. j struct biol: x poweleit n, czudnochowski n, nakagawa r, trinidad dd, murphy kc, sassetti cm, rosenberg os ( ) the structure of the endogenous esx- secretion system. elife pym as, brodin p, majlessi l, brosch r, demangel c, williams a, griffiths ke, marchal g, leclerc c, cole st ( ) recombinant bcg exporting esat- confers enhanced protection against tuberculosis. nat med : - rosenberg os, dovala d, li x, connolly l, bendebury a, finer-moore j, holton j, cheng y, stroud rm, cox js ( ) substrates control multimerization and activation of the multi-domain atpase motor of type vii secretion. cell : - ruan j, xia s, liu x, lieberman j, wu h ( ) cryo-em structure of the gasdermin a membrane pore. nature : - rucker al, creamer tp ( ) polyproline ii helical structure in protein unfolded states: lysine peptides revisited. protein sci : - sani m, houben en, geurtsen j, pierson j, de punder k, van zon m, wever b, piersma sr, jimenez cr, daffe m et al ( ) direct visualization by cryo-em of the mycobacterial capsular layer: a labile structure containing esx- -secreted proteins. plos pathog : e schaberg t, rebhan k, lode h ( ) risk factors for side-effects of isoniazid, rifampin and pyrazinamide in patients hospitalized for pulmonary tuberculosis. eur respir j : - scheres sh ( ) relion: implementation of a bayesian approach to cryo-em structure determination. j struct biol : - scheres sh, chen s ( ) prevention of overfitting in cryo-em structure determination. nat methods : - siegrist ms, unnikrishnan m, mcconnell mj, borowsky m, cheng ty, siddiqi n, fortune sm, moody db, rubin ej ( ) mycobacterial esx- is required for mycobactin-mediated iron acquisition. proc natl acad sci u s a : - simeone r, bobard a, lippmann j, bitter w, majlessi l, brosch r, enninga j ( ) phagosomal rupture by mycobacterium tuberculosis results in toxicity and host cell death. plos pathog : e smart os, goodfellow jm, wallace ba ( ) the pore dimensions of gramicidin a. biophys j : - solomonson m, huesgen pf, wasney ga, watanabe n, gruninger rj, prehna g, overall cm, strynadka nc ( ) structure of the mycosin- protease from the mycobacterial esx- protein type vii secretion system. j biol chem : - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / solomonson m, setiaputra d, makepeace kat, lameignere e, petrotchenko ev, conrady dg, bergeron jr, vuckovic m, dimaio f, borchers ch et al ( ) structure of espb from the esx- type vii secretion system and insights into its export mechanism. structure : - stanley sa, johndrow je, manzanillo p, cox js ( ) the type i ifn response to infection with mycobacterium tuberculosis requires esx- -mediated secretion and contributes to pathogenesis. j immunol : - tan yz, baldwin pr, davis jh, williamson jr, potter cs, carragher b, lyumkis d ( ) addressing preferred specimen orientation in single-particle cryo-em through tilting. nat methods : - theillet fx, kalmar l, tompa p, han kh, selenko p, dunker ak, daughdrill gw, uversky vn ( ) the alphabet of intrinsic disorder: i. act like a pro: on the abundance and roles of proline residues in intrinsically disordered proteins. intrinsically disord proteins : e uversky vn ( ) the alphabet of intrinsic disorder: ii. various roles of glutamic acid in ordered and intrinsically disordered proteins. intrinsically disord proteins : e van der wel n, hava d, houben d, fluitsma d, van zon m, pierson j, brenner m, peters pj ( ) m. tuberculosis and m. leprae translocate from the phagolysosome to the cytosol in myeloid cells. cell : - van winden vj, ummels r, piersma sr, jimenez cr, korotkov kv, bitter w, houben en ( ) mycosins are required for the stabilization of the esx- and esx- type vii secretion membrane complexes. mbio wang q, boshoff him, harrison jr, ray pc, green sr, wyatt pg, barry ce, rd ( ) pe/ppe proteins mediate nutrient transport across the outer membrane of mycobacterium tuberculosis. science : - waterhouse am, procter jb, martin dm, clamp m, barton gj ( ) jalview version --a multiple sequence alignment editor and analysis workbench. bioinformatics : - williams cj, headd jj, moriarty nw, prisant mg, videau ll, deis ln, verma v, keedy da, hintze bj, chen vb et al ( ) molprobity: more and better reference data for improved all-atom structure validation. protein science : - williamson za, chaton ct, ciocca wa, korotkova n, korotkov kv ( ) pe -ppe -espg heterotrimer structure from mycobacterial esx- secretion system gives insight into cognate substrate recognition by esx systems. j biol chem world health organization, . global tuberculosis report. geneva: world health organization. xie nz, du qs, li jx, huang rb ( ) exploring strong interactions in proteins with quantum chemistry and examples of their applications in drug design. plos one : e xu j, laine o, masciocchi m, manoranjan j, smith j, du sj, edwards n, zhu x, fenselau c, gao ly ( ) a unique mycobacterium esx- protein co-secretes with cfp- /esat- and is necessary for inhibiting phagosome maturation. mol microbiol : - zhang k ( ) gctf: real-time ctf determination and correction. j struct biol : - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / zhang y, tammaro r, peters pj, ravelli rbg ( ) could egg white lysozyme be solved by single particle cryo-em? j chem inf model : - zheng sq, palovcak e, armache jp, verba ka, cheng y, agard da ( ) motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy. nat methods : - zinke m, sachowsky kaa, Öster c, zinn-justin s, ravelli rbg, schröder gf, habeck m, lange a ( ) spinal column architecture of the flexible spp bacteriophage tail tube. nature communications : zivanov j, nakane t, forsberg bo, kimanius d, hagen wj, lindahl e, scheres sh ( ) new tools for automated high-resolution cryo-em structure determination in relion- . elife zuber b, chami m, houssin c, dubochet j, griffiths g, daffe m ( ) direct visualization of the outer membrane of mycobacteria and corynebacteria in their native state. j bacteriol : - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure legends fig . oligomerisation of espb is promoted by an acidic environment. (a) size exclusion chromatography profiles of m. tuberculosis espb - at µm in mm acetate buffer (ph . ), mm nacl and mm tris (ph . ), mm nacl. void volume corresponds to . ml elution volume. (b) oligomer/monomer ratios at different protein concentrations in conditions from panel (a). the absorbance values of the oligomer were taken at . ml while the monomer values were at . ml. (c–d) presence of the different oligomer species from m. tuberculosis espb - at ph . and ph . obtained by native mass spectrometry. fig . impact of espb c-terminus processing on oligomerisation. (a) scheme of the different constructs used in this work, where espb - is in blue, espb - in orange (mycp cleavage site), espb - in grey, espb - in yellow and espb - in green. structural model from pdb id xxx, while the c-terminal region is a representation of an unfolded protein. arrows represent the end of each construct. (b) size exclusion chromatograms of each construct corresponding to the colours in panel (a), resulting from µl sample injection at µm eluted in mm acetate buffer (ph . ), mm nacl. void volume corresponds to . ml elution volume. fig . oligomerisation differences between espb orthologues despite sharing similar tertiary structure. (a) evaluation of the oligomerisation of espb orthologues and pe -ppe by cryo- electron microscopy. scale bars represent nm. (b) different views of structural alignment of espbmtb (yellow – this work), espbmmar (green – this work), espbmsmeg (light blue – pdb id wj ), and pe -ppe (orange – pdb id w k). (c) multi-alignment of amino acid sequences of different species from the mycobacterium genus, as well as the protein pair pe -ppe . numbering and sequence identity is based on the sequence of m. tuberculosis. rectangles denote residues involved in the oligomerisation of espb. alignment was generated using clustalw server, and figure was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / created using software jalview (waterhouse et al., ). the colour scheme of clustalx is used (larkin et al., ). fig . loss of the espb preferential orientation by removal of its c-terminal residues. (a–b) representative micrograph of espb - with preferential orientation at °-tilt angle or °-tilt angle. (c) representative micrograph of espb - with random orientation taken at °-tilt angle. insets correspond to the respective d classes. scale bars in a–c represent nm; scale bars in insets represent nm. fig . cryo-em reconstruction of espb - heptamer complex. (a–b) density map and structural model made with chimerax (goddard et al., ), showing each monomer in different colours. for (a) and (b), the upper panels show the top views and the bottom panels show the side views. (c) model and densities of intramolecular interaction at w -y and intermolecular interaction at q -q . colours follow the conventional colouring code for chemical elements. fig . characterisation of the espb oligomer. (a) electrostatic potential of espb oligomer at acidic and (b) neural ph. the protonation state was assigned by propka (olsson et al., ) and electrostatic calculations were generated by apbs (baker et al., ) and pdbpqr (dolinsky et al., ). (c) the smallest inner diameter of the espb oligomer is Å, as calculated by hole (smart et al., ). (d) surface representation of amino acid hydrophobicity according to the kyte-doolittle scale (polar residues – purple, non-polar residues – gold). (e) high-resolution d class of espb heptamer with extra density in the middle. (f) d projection of the d map obtained for the + espb oligomer. (g) c d map of + espb oligomer with local symmetry applied to the heptamer ring. (h) c d map of the + espb oligomer with -fold local symmetry applied and models fitted to the map. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig . putative pathways for the oligomerisation of espb. in model , espb is cleaved in its c-terminus by the protease mycp in the periplasm of mycobacteria leaving hydrophobic residues to insert into the outer membrane; an increase in the local concentration on the membrane leads to oligomerisation of espb. in model , secretion of espb across the double membrane after myp cleavage allows the protein to bind to either the phagosomal membrane or the external part of the outer membrane. in model , after cleavage in the periplasm and secretion to the exterior of the bacterium, espb undergoes a conformational change dissociating the pe and ppe domains and exposing hydrophobic residues that would allow the insertion into the membrane; while the ppe gets embedded into the membrane in an oligomeric form, the respective pe is able to interact with the ppe of a second molecule forming a tubular structure. different colours are used for each heptamer- subunit. regardless of what oligomerisation pathway espb follows, oligomerised espb is hypothesised to form part of the larger machinery that completes the inner-membrane complex of esx- . table and their legends table . constructs used in this study plasmid name species gene product concentration used for cryo-em experiments pag m. tuberculosis ×his-espb - . mg/ml pag m. tuberculosis ×his-espb - . mg/ml pag m. tuberculosis ×his-espb - . mg/ml pag m. tuberculosis ×his-espb - mg/ml pag m. tuberculosis ×his-espb - - pag m. tuberculosis ×his-mbp-espb - - pag m. tuberculosis ×his-espb - q a - pag m. tuberculosis ×his-espb - n c/t c - pag m. marinum ×his-espb - . mg/ml pag m. marinum ×his-espb - . mg/ml pag m. haemophilum ×his-espb - mg/ml pag m. smegmatis ×his-espb - mg/ml pag m. tuberculosis pe / ×his-ppe mg/ml .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table . statistics of cryo-em data collection, reconstruction and structure refinement espb - m. tuberculosis espb - m tuberculosis espb - m. marinum grid type quantifoil ultraaufoil au mesh r / quantifoil ultraaufoil au mesh r . / . quantifoil ultraaufoil au mesh r . / . microscope tfs tecnai arctica tfs krios tfs krios camera falcon iii k electron counting k electron counting automated data acquisition software serialem epu epu nominal magnification (k×) physical pixel size (Å) . . . exposure time (s) . . fluence (e− Å− ) micrographs #fractions particles symmetry imposed c c c average resolution (Å) . . . fsc threshold . . . map sharpening b factor (Å ) - − - refinement initial model used (pdb entry) xxx model resolution (Å) fsc threshold . . model composition atoms hydrogen atoms protein residues waters b factors (Å ) r.m.s. deviations bond lengths (Å) . . bond angles (°) . . correlation coefficients mask . . box . . validation molprobity score . . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / clashscore . . rotamers outliers (%) . . ramachandran plot favored (%) . . allowed (%) . . disallowed (%) . . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / expanded view figure legends fig ev . sequence alignment of espb from different mycobacterial species. numbering and sequence identity are based on the sequence of m. tuberculosis. alignment was generated using clustalw server, and figure was created using jalview software (waterhouse et al., ). the colour scheme of clustalx is used (larkin et al., ). fig ev . espb preferential orientation caused by an interaction to the air-water interface. (a–b) tomogram slice of espb - with nm thickness in x,y and x,z orientation, respectively. fig ev . cryo-em analysis of espb structure. gold-standard fourier shell correlation (fsc) plot of espb - from m. tuberculosis (a) and espb - from m. marinum (b). (c) quality of cryo-em–derived density map. selected regions showing the fit of the derived atomic model to the cryo-em density map (black mesh) (d) size exclusion chromatograms of espb - from m. tuberculosis and mutants that affect oligomerisation. fig ev . oligomerisation of espb is independent of the integrity of the pe-ppe linker. (a) trypsin digestion of different espb constructs from m. tuberculosis over – h. (b) structural model of an espb monomer (pdb id xxx) showing the pe-region (gold), ppe-region (grey) and the trypsin cleavage site (arrow, residues r –v ). (c, d) native mass spectrometry of the trypsin-digested sample, raw and deconvoluted data. colour coding as in panel (b). inset table compares the respective mass of the fragments calculated from the sequence and native mass spectrometry. (e) sec (top) and sds-page (bottom) of undigested (black) and trypsin-digested (red) espb - . fig ev . characterisation of the c-terminal region of espb. (a) kyte-doolittle hydrophobicity plot of residues – of espb from m. tuberculosis. inset shows the degree of hydrophobicity of residues – from different species. window size of was used as parameter. (b, c) far uv circular .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / dichroism spectra of m. tuberculosis espb - at different ph and tfe concentrations. inset in (b) shows the spectrum-difference between ph . and ph . . fig ev . higher-order oligomer formation. size exclusion chromatography profiles of espb - from m. tuberculosis ( mg/ml) injected onto a superdex increase / gl. inset corresponds to a blue-native page of the sec fractions highlighted in red. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / heparan sulfate proteoglycans as attachment factor for sars-cov- heparan sulfate proteoglycans as attachment factor for sars-cov- lin liu, , pradeep chopra, , xiuru li, , kim m. bouwman, s. mark tompkins, margreet a. wolfert, , robert p. de vries , and geert-jan boons , , ,* complex carbohydrate research center, university of georgia, riverbend road, athens, ga , usa department of chemical biology and drug discovery, utrecht institute for pharmaceutical sciences, and bijvoet center for biomolecular research, utrecht university, universiteitsweg , cg utrecht, the netherlands center for vaccines and immunology, university of georgia, athens, ga , usa department of chemistry, university of georgia, athens, ga , usa these authors contributed equally to this work *corresponding author. e-mail: gjboons@ccrc.uga.edu or g.j.p.h.boons@uu.nl (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract severe acute respiratory syndrome-related coronavirus (sars-cov- ) is causing an unprecedented global pandemic demanding the urgent development of therapeutic strategies. microarray binding experiments using an extensive heparan sulfate (hs) oligosaccharide library showed that the receptor binding domain (rbd) of the spike of sars-cov- can bind hs in a length- and sequence-dependent manner. hexa- and octa- saccharides composed of idoa s-glcns s repeating units were identified as optimal ligands. surface plasma resonance (spr) showed the sars-cov- spike protein binds with much higher affinity to heparin (kd = nm) compared to the rbd (kd = µm) alone. we also found that heparin does not interfere in angiotensin-converting enzyme (ace ) binding or proteolytic processing of the spike. our data supports a model in which hs functions as the point of initial attachment for sars-cov- infection. tissue staining studies using biologically relevant tissues indicate that heparan sulfate proteoglycan (hspg) is a critical attachment factor for the virus. collectively, our results highlight the potential of using hs oligosaccharides as a therapeutic agent by inhibiting sars-cov- binding to target cells. keywords sars-cov- , coronavirus, heparan sulfate, heparin, spike glycoprotein, microarray, surface plasma resonance (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction the sars-cov- pandemic demands urgent development of therapeutic strategies. an attractive approach is to interfere in the attachment of the virus to the host cell. the entry of sars-cov- into cells is initiated by binding of the transmembrane spike (s) glycoprotein of the virus to angiotensin-converting enzyme (ace ) of the host. sars- cov is closely related to sars-cov- and employs the same receptor. the spike protein of sars-cov- is comprised of two subunits; s is responsible for binding to the host receptor, whereas s promotes membrane fusion. the c terminal domain (ctd) of s harbors the receptor binding domain (rbd). it is known that the spike protein of a number of human coronaviruses can bind to a secondary receptor, or co-receptor, to facilitate cell entry. for example, mers-cov employs sialic acid as co-receptor along with its main receptor dpp . human cov-nl , which also utilizes ace as the receptor, uses heparan sulfate (hs) proteoglycans, as a co-receptor. it has also been shown that entry of sars-cov pseudo-typed virus into vero e and caco- cells can substantially be inhibited by heparin or treatment with heparin lyases, indicating the importance of hs for infectivity. there are indications that the sars-cov- spike also interacts with hs. one early report showed that heparin can induce a conformation change in the rbd of sars-cov- . a combined spr and computational study indicated that glycosaminoglycans can bind to the proteolytic cleavage site of the s and s protein. - several reports have indicated that heparin or related structures can inhibit the infection process of sars-cov- in different cell lines. - hs are highly complex o- and n-sulfated polysaccharides that reside as major components on the cell surface and extracellular matrix of all eukaryotic cells. various proteins interact with hs thereby regulating many biological and disease processes, including cell adhesion, proliferation, differentiation, and inflammation. they are also used by many viruses, including herpes simplex virus (hsv), dengue virus, hiv, and various coronaviruses, as receptor or co-receptor. - the biosynthesis of hs is highly regulated and the length, degree, and pattern of sulfation of hs can differ substantially between different cell types. the so-called “hs sulfate code hypothesis” is based on the notion that the expression of specific hs epitopes (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . by cells makes it possible to recruit specific hs-binding proteins, thereby controlling a multitude of biological processes. - in support of this hypothesis, several studies have shown that hs binding proteins exhibit preferences for specific hs oligosaccharide motifs. - therefore, we were compelled to investigate whether the spike of sars-cov- recognizes specific hs motifs. such insight is expected to pave the way to develop inhibitors of viral cell binding and entry. previously, we prepared an unprecedented library of structurally well-defined heparan sulfate oligosaccharides that differ in chain length, backbone composition and sulfation pattern. - this collection of hs oligosaccharides was used to develop a glycan microarray for the systematic analysis of selectivity of hs-binding proteins. using this microarray platform in conjugation with detailed binding studies, we found that the rbd domain of sars-cov- -spike can bind hs in a length- and sequence-dependent manner, and the observations support a model in which the rbd confers sequence selectivity, and the affinity of binding is enhanced by additional interactions with other hs binding sites in for example the s /s proteolytic cleavage site. in addition, it was found that heparin does not interfere in ace binding or proteolytic processing of the spike. tissue staining studies using biologically relevant tissues indicate that heparan sulfate proteoglycans (hspg) is a critical attachment factor for the virus. results and discussion surface plasma resonances (spr) experiments were performed to probe whether the rbd domain of sars-cov- spike protein can bind with heparin. biotinylated heparin was immobilized on a streptavidin-coated sensor chip and binding experiments were carried out by employing as analytes different concentrations of rbd, monomeric spike protein and trimeric spike protein of sars-cov- . the spike glycoprotein of sars-cov- (s +s , extra cellular domain, amino acid residue - ) was expressed in insect cells having a c-terminal his-tag. - recombinant sars-cov- -rbd, containing amino acid residue - , was expressed in hek cells also with a c-terminal his-tag. - the spike protein trimer, having the furin cleavage site deleted and bearing with two stabilizing mutations, was expressed in hek cells with a c-terminal his-tag. representative (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . sensorgrams are shown in fig. . kd values were determined using a : langmuir binding model. figure . spr sensorgrams representing the concentration-dependent kinetic analysis of the binding of immobilized heparin with sars-cov- related proteins (a) rbd, (b) spike monomer, and (c) spike trimer. the rbd domain binds to heparin with a moderate affinity having a kd value of ~ µm. the full-length monomeric spike protein showed a much higher binding affinity with a kd value of nm. previously reported computational studies have indicated that the rbd domain may harbor an additional hs binding domain located either within or adjacent to the receptor binding motif. , it has also been suggested that another hs-binding site spike monomer kd = nm a b spike monomer kd = nm - r es po ns e (r u ) ti m e (s) - kd = nm spike trimer kd = nm r es po ns e (r u ) - - c ti m e (s) kd = nm - - ti me (s) r es po ns e (r u ) kd = nm nm nm nm . nm nm . nm folds dilution rbd spike monomer spike trimer (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . reside in the s /s proteolytic cleavage site of the spike of the s domain. thus, the high affinity of the monomeric spike protein probably is due to the presence of additional binding site in the spike protein, which greatly enhanced its binding to heparin. the trimeric spike protein displayed a similar binding affinity (kd = nm) as the monomer. one of the putative heparin binding sites in the trimeric spike protein, the s /s proteolytic cleavage site was mutated. thus, a possible increase in avidity due to multivalency may have been off-set by a lack of a secondary binding site. intrigued by these results, we examined if the sars-cov- proteins bind to heparan sulfate in a sequence preferred manner. we have developed an hs microarray having well over unique di-, tetra-, hexa-, and octa-saccharides differing in backbone composition and sulfation pattern - (fig. c). the synthetic hs oligosaccharides contains an anomeric aminopentyl linker allowing printing on n-hydroxysuccinimide (nhs)-active glass slides. the hs oligosaccharides were printed at µm concentration in replicates of by non-contact piezoelectric printing. the quality of the hs microarray was validated using various well characterized hs-binding proteins. sub-arrays were incubated with different concentrations of sars-cov- rbd and spike protein in a binding buffer (ph . , mm tris, mm nacl, mm cacl , mm mgcl with % bsa and . % tween- ) at room temperature for h. after washing and drying, the subarrays were exposed to an anti-his antibody labeled with alexafluor® for another hour, washed, dried and binding was detected by fluorescent scanning. to analyze the data, the compounds were arranged according to increasing backbone length, and within each group by increasing numbers of sulfates. intriguingly, the proteins showed a strong preference for specific hs oligosaccharides (fig. a, b). furthermore, it was found that the rbd, monomeric spike protein, and trimeric spike protein exhibit similar binding patterns (fig. s ). compounds showing strong responsiveness ( , , , and ) are composed of tri-sulfated repeating units (idoa s-glcns s). the binding is length-dependent and hs oligosaccharide (idoa s-glcns s) and (idoa s- glcns s) having four and three repeating units, respectively, showed the strongest binding. on the other hand, tetrasaccharide (idoa s-glcns s) , which has the same repeating unit structure, gave very low responsiveness. a similar observation was made for disaccharide (idoa s-glcns s). (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . binding of synthetic heparan sulfate oligosaccharides to sars-cov- -spike and rbd by microarray. (a) binding of sars-cov- -spike ( µg/ml) to the heparan sulfate microarray. the strongest binding structures are shown as inserts. (b) binding of sars-cov -rbd ( µg/ml) on the heparan sulfate microarray. (c) compounds numbering and structures of the heparan sulfate library. idoa s-glcns s idoa s-glcns s -idoa s-glcns s glca-glcns s-idoa s-glcns s -idoa s-glcns s glca-glcns s-idoa s-glcns s s-idoa s-glcns s idoa s-glcns s-idoa s-glcns s -idoa s-glcns s idoa s-glcns s-idoa s-glcns s-idoa s-glcns s -idoa s-glcns s × × × × × × fl uo re sc en ce (a u ) fl uo re sc en ce (a u ) a b di tetra hexa# of sµgar octa xso - x x - xso - - xso - xso - xso - xso - xso - x x x xso - xso - x idoa-glcnac s glca-glcnac s-idoa s-glcnac s glca-glcns s-glca s-glcns s glca-glcnac s idoa-glcnac s-idoa s-glcnac s idoa s-glcns s-idoa s-glcns s idoa s-glcnac s glca-glcns-idoa s-glcns glca-glcns s s-idoa s-glcns s idoa s-glcns s glca-glcns-glca s-glcns glca-glcnac-idoa s-glcnac s-glca-glcnac glca-glcnac-glca-glcnac glca-glcnac-idoa s-glcns s glca-glcns-idoa s-glcns s-glca-glcns glca-glcnac-idoa-glcnac glca-glcns-idoa-glcns s glca-glcnac-idoa s-glcns s-idoa s-glcns glca-glcnac-idoa s-glcnac idoa-glcns s-glca-glcns glca-glcns s-glca-glcns s-glca-glcns s glca-glcnac-glca s-glcnac idoa s-glcnac s-glca-glcnac s glca-glcns s-idoa-glcns s-glca-glcns s glca-glcnac-idoa-glcnac s idoa s-glcns-glca-glcns glca-glcns s-glca-glcns s-idoa-glcns s idoa-glcnac s-glca-glcnac glca-glcns s-idoa-glcns glca-glcns s-idoa-glcns s-idoa-glcns s idoa s-glcnac-glca-glcnac idoa-glcns-idoa-glcns s glca-glcns-idoa s-glcns s-idoa s-glcns idoa-glcnac-idoa-glcnac s glca-glcnac s-glca s-glcnac s glca-glcnac s-idoa s-glcns s-idoa s-glcns idoa-glcnac s-idoa-glcnac idoa-glcns s-glca-glcns s glca-glcns s-idoa s-glcns s-glca-glcns s glca-galnac-glca-galnac s glca-glcns s-idoa-glcns s glca-glcns s-idoa s-glcns s-idoa-glcns s idoa-glcns-idoa-glcnac glca-glcns s-glca-glcns s glca-glcns s-glca-glcns s-idoa s-glcns s idoa-glcnac-idoa-glcns idoa-glcns s-idoa-glcns s glca-glcns s-idoa-glcns s-idoa s-glcns s idoa-glcnac s-idoa-glcnac s glca-glcns-glca s-glcns s glca-glcns s-idoa s-glcns s-idoa s-glcns idoa-glcnac s-glca-glcnac s glca-glcns-idoa s-glcns s glca-glcns s-idoa s-glcns s s-glca-glcns s glca-glcnac s-idoa-glcnac s idoa s-glcns s-glca-glcns glca-glcns s-idoa s-glcns s s-idoa-glcns s glca-glcnac s-glca-glcnac s idoa s-glcnac s-idoa s-glcnac s glca-glcns s-glca-glcns s s-idoa s-glcns s glca-glcnac-glca s-glcnac s idoa-glcns-idoa s-glcns s glca-glcns s-idoa-glcns s s-idoa s-glcns s glca-glcnac-idoa-glcns s idoa s-glcns s-idoa-glcnac s glca-glcns s-idoa s-glcns s-idoa s-glcns s glca-glcnac-idoa s-glcnac s glca-glcns s-idoa s-glcns s glca-glcns s-idoa s-glcns s s-idoa s-glcns s idoa s-glcnac s-glca-glcnac idoa s-glcns s-glca-glcns s idoa s-glcns s-idoa s-glcns s-idoa s-glcns s glca-galnac s-glca-galnac s idoa-glcns s-idoa s-glcns s glca-glcns s-idoa-glcns-idoa s-glcns s-idoa-glcnac s glca-glcns s-glca-glcnac glca-glcns s-idoa s-glcns s idoa s-glcns s-idoa s-glcns s-idoa s-glcns s-idoa s-glcns s idoa-glcns-idoa-glcns idoa s-glcns s-idoa s-glcns c (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the structure-binding data shows that perturbations in the backbone or sulfation pattern led to substantial reductions in binding. the importance of the idoa s residue is highlighted by comparing hexasaccharides with in which a single idoa s in the distal disaccharide repeating unit is replaced with glca. this modification leads to a substantial reduction in responsiveness. further replacements of idoa s with glca in compound completely abolish binding, as evident for compounds , , and . the structure-activity data also showed that the -o-sulfates are crucial, and binding was lost when such functionalities were not present ( vs. , , and ). lack of one or more - o-sulfates also resulted in substantial reductions in binding ( vs. and ). although the sars-cov- spike and rbd showed similar selectivities, the binding of the spike appeared stronger and much higher fluorescent readings were observed at the same protein concentration. next, we examined whether hs oligosaccharide can interfere in the interaction of the spike or rbd with immobilized heparin. thus, the spike protein ( nm) or rbd ( . µm) were pre-mixed with different concentrations of compound and then used as analytes. the ic values were determined by non-linear fitting of log(inhibitor) vs. response using variable slope (fig. s ). the ic values for the spike protein and rbd are nm and nm, respectively. to further determine the possible role of hs in the infection process, we examined the binding affinities of spike proteins to ace and compared these with binding affinities for heparin. biotinylated ace was immobilized on a streptavidin-coated sensor chip and binding experiments were performed with different concentrations of the sars-cov- derived proteins. representative sensorgrams for the rbd domain, monomeric spike protein, and trimeric spike protein are shown in fig. . kd values of . nm, . nm and . nm were determined using a : langmuir binding model, respectively, which are in agreement with reported data. it shows convincingly that the rbd domain has a much higher affinity for ace compared to that of heparin. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . sensorgrams representing the concentration-dependent kinetic analysis of the binding of immobilized ace with sars-cov- derived proteins (a) rbd, (b) spike monomer, and (c) spike trimer. (d) comparison of the kd values of heparin binding and ace binding to sars- cov- related proteins. d protein heparin binding kd (nm) ace binding kd (nm) rbd ~ . spike monomer . spike trimmer . a d a - - ti m e (s) r es po ns e (r u ) kd = . nm - - ti me (s) r es po ns e (r u ) kd = . nm - - ti m e (s) r es po ns e (r u ) kd = . nm b c nm . nm nm . nm nm . nm folds dilution rbd spike monomer spike trimer (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a number of reports have indicated that heparin and related compounds can block infection of cells by sars-cov- . therefore, we were compelled to investigate the molecular mechanisms by which heparin blocks viral entry. , , it is possible that the anti-viral properties of heparin are due to binding to the rbd domain thereby blocking the interaction with ace . alternatively, heparin may interfere in the proteolytic processing of the spike protein thereby preventing membrane fusion. in this respect, the spike of sars-cov- contains a unique furin cleavage site, which is not present in other cov’s, and has been proposed to contribute to high infectivity, because cleavage of the spike protein is a prerequisite for membrane fusion. modeling studies have indicated that the furin cleavage site may harbor a binding site for hs. finally, hs may function as an attachment factor and the addition of exogenous heparin may interfere in this process. to examine whether heparin can interfere in binding of the spike to ace , we performed microarray experiments in which biotinylated fc tagged ace ( µg/ml) was printed onto streptavidin coated microarray slides. the printing quality was confirmed by using a goat-anti-human fc antibody conjugated with alexafluoro® (fig. s a). next, his-tagged rbd and monomeric spike protein were premixed with different concentrations of heparin and binding of the proteins to immobilized ace was accomplished by anti-his antibody. soluble human ace was used as positive control. although, ace efficiently inhibited rbd and spike binding (fig. s b, c), no substantial changes in binding were observed in the presence of µg/ml and µg/ml of heparin (fig. a, b). furthermore, we immobilized the rbd and monomeric spike proteins on elisa plates and assayed the binding of ace to the spike proteins in the presence or absence of heparin (fig. c, d). soluble human ace was used as a positive control, which as expected exhibited potent inhibition. at µg/ml of heparin, no inhibition of binding was observed for either rbd or monomeric spike protein. these results indicate that heparin does not substantially interfere in the interaction of the spike with ace . to investigate whether the binding of heparin can hinder cleavage of the spike protein by furin, we exposed the monomeric spike protein to furin in the presence of different concentrations of heparin and examined protein cleavage by sds-page. the spike protein (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . was readily cleaved by furin even in the presence of high concentration of heparin ( µg/ml), while µg/ml of a known furin inhibitor completely abolished cleavage. figure . (a) influence of heparin on the binding of his-tagged rbd or (b) his-tagged spike monomer to biotinylated human ace immobilized on streptavidin coated microarray slides. detection of rbd and spike was accomplished using an anti-his antibody labeled with alexafluor . (c) influence of heparin on the binding of biotinylated human ace to rbd and (d) to immobilized spike monomer immobilized to high surface microtiter plates. binding was detected by treatment with streptavidin-hrp followed by addition of a colorimetric hrp substrate. (e) western blot analysis of furin-mediated cleavage of spike monomer in the presence and absence of heparin or a known furin inhibitor (hexa-d-arginine). it is also possible that heparin interferes in the initial attachment of the virus to the glycocalyx thereby preventing infection. therefore, we examined the importance of hs for × × . × a b c ed r fu heparin (µg/ml) × × × r fu heparin (µg/ml) rbd spike monomer heparin (µg/ml) ab so rb an ce ( nm ) . . . . . heparin (µg/ml) ab so rb an ce ( nm ) rbd spike monomer spike monomer cleaved s heparin (µg/ml) - -- furin +++ -+ hexa-d-arg +-- -- (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . binding of trimeric rbd to relevant tissues. ferrets are a susceptible animal model for sars-cov- - and closely related minks are easily infected on farms. formalin-fixed, paraffin-embedded lung tissue slides resemble the complex membrane structures to which spike proteins need to bind before it can engage with ace for cell entry. expression of ace was assessed using an ace antibody allowing us to compare the binding with the sars-cov-rbd protein and binding localization and dependency on hs. the ace antibody (fig. a) and the rbd trimer bound efficiently to the ferret lung tissues (fig. b). we also examined a commonly used heparan sulfate antibody, which bound efficiently to ferret lung tissue, indicating the omnipresence of hs. after overnight exposure to heparanase (hpse), the ace antibody staining was mostly unaffected, indicating hspg-independent binding. on the other hand, the sars-cov- rbd trimer was not able to engage with the ferret lung tissue slide after hpse treatment. no staining was observed with the heparin sulfate antibody ( e ), indicating all hs had been removed. thus, these results indicate that hs is required for initial cell attachment before the spike can engage with ace . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figure . binding of ace antibody, sars-cov- rbd, and heparan sulfate antibody to ferret lung serial tissue slides. (a) ace antibody staining without and after hpse treatment. (b) sars-cov- rbd staining without and after hpse treatment. (c) heparan sulfate antibody ( e ) staining without and after hpse treatment. hpse treatment was achieved by overnight incubation of the tissues with hpse ( . µg/ml) at oc. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . discussion and conclusions the glycan microarray and spr results indicate that the spike of sars-cov- can bind hs in a length- and sequence-dependent manner, and hexa- and octa-saccharides composed of idoa s-glcns s repeating units have been defined as optimal ligands. the data supports a model in which the rbd of the spike confers sequence specificity and an additional hs binding site in the s /s proteolytic cleavage site enhances the avidity of binding probably by non-specific interactions. in a biorxiv preprint, we presented, for the first time, experimental support for such a model and subsequent papers have confirmed that the rbd harbors a hs binding site. although idoa s-glcns s sequons are abundantly present in heparin, it is a minor component of hs. interestingly, it has been reported that the expression of the (glcns s-idoa s) motif is highly regulated and plays a crucial role in cell behavior and disease including endothelial cell activation. severe thrombosis in covid- patients is associated with endothelial dysfunction and a connection may exist between sars-cov- ’s ability to bind to hs and thrombotic disorder. it is also possible that hs is a determinant of the cell- and tissue tropism. a number of reports have shown that heparin and related products can block infection by pseudotyped virus or authentic sars-cov- virus. - , we explored the possibility that binding of heparin blocks the rbd from interacting with ace . however, in two experimental formats such properties were not observed. we found that the affinity of the rbd for heparin is much lower than that for ace , providing a rationale for the inability of heparin to inhibit the binding between rbd or spike with ace . one computational study has indicated that ace and hs bind to the same region of the rbd. another docking study located the hs binding site adjacent to the ace -binding site and inferred a model in which a ternary complex is formed between rbd, hs and ace . further studies are required to determine the exact location of the hs binding site, which in turn may provide a better understanding of the interplay between binding of spike with ace and heparin. we employed physiological relevant tissues to explore the importance of hs for sars- cov- adhesion and demonstrated that hpse treatment greatly reduces rbd binding but not that of ace . the data supports a model in which hs functions as a host attachment factor that facilitates sars-cov- infection. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the current clinical guidelines call for the use of unfractionated heparin or low molecular weight heparin (lmwh) for the treatment of all covid- patients for systemic clotting in the absences of contradictions. - heparin treatment may have additional benefits and may compete with the binding of the spike protein to cell surface hs thereby preventing infectivity. our data suggest that non-coagulating heparin or hs preparations can be developed that reduce cell binding and infectivity without a risk of causing bleeding. in this respect, administration of heparin requires great care because its anticoagulant activity can result in excessive bleeding. antithrombin iii (at-iii), which confers anticoagulant activity, binds a specific pentasaccharide glcnac( s)-glca- glcns( s)( s)-idoa s-glcns( s) embedded in hs or heparin. removal of the sulfate at c- of n-sulfoglucosamine (glcns s) of the pentasaccharide results in a -fold reduction in binding affinity. importantly, such a functionality is not present in the identified hs ligand of sars-cov- spike, and therefore compounds can be developed that can inhibit cell binding, but do not interact with atiii. as a result, such preparations can be used at higher doses without causing adverse side effects. our data also shows that multivalent interactions of the spike with hs results in high avidity of binding. this observation provides opportunities to develop glycopolymers modified by hs oligosaccharides as inhibitors of sars-cov- cell binding to prevent or treat covid- . acknowledgments this research was supported by the national institutes of health (p gm and r hl to g.-j.b.). r.p.dv is a recipient of an erc starting grant from the european commission ( ) and a beijerinck premium of the royal dutch academy of sciences. we thank sander herfst (department of viroscience, erasmus medical center) for the ferret tissues and gavin wright (addgene) for providing hpse-bio-his (plasmid # ). plasmids for expression of sars-cov- spike and rbd proteins were provided by dr. florian krammer (icahn school of medicine at mount sinai, produced under niaid ceirs contract hhsn c). production of recombinant proteins was supported by niaid centers of excellence for influenza research and surveillance (ceirs) contract hhsn c to s.m.t. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references . dimitrov, d. s., virus entry: molecular mechanisms and biomedical applications. nat. rev. microbiol. , ( ), - . . walls, a. c.; park, y.-j.; tortorici, m. a.; wall, a.; mcguire, a. t.; veesler, d., structure, function, and antigenicity of the sars-cov- spike glycoprotein. cell , ( ), - .e . . li, f.; li, w.; farzan, m.; harrison, s. c., structure of sars coronavirus spike receptor-binding domain complexed with receptor. science , ( ), - . . monteil, v.; kwon, h.; prado, p.; hagelkrüys, a.; wimmer, r. a.; stahl, m.; leopoldi, a.; garreta, e.; hurtado del pozo, c.; prosper, f.; romero, j. p.; wirnsberger, g.; zhang, h.; slutsky, a. s.; conder, r.; montserrat, n.; mirazimi, a.; penninger, j. m., inhibition of sars-cov- infections in engineered human tissues using clinical-grade soluble human ace . cell , ( ), - . . li, w.; hulswit, r. j. g.; widjaja, i.; raj, v. s.; mcbride, r.; peng, w.; widagdo, w.; tortorici, m. a.; van dieren, b.; lang, y.; van lent, j. w. m.; paulson, j. c.; de haan, c. a. m.; de groot, r. j.; van kuppeveld, f. j. m.; haagmans, b. l.; bosch, b.-j., identification of sialic acid-binding function for the middle east respiratory syndrome coronavirus spike glycoprotein. proc. natl. acad. sci. , ( ), e -e . . milewska, a.; zarebski, m.; nowak, p.; stozek, k.; potempa, j.; pyrc, k., human coronavirus nl utilizes heparan sulfate proteoglycans for attachment to target cells. j. virol. , ( ), - . . lang, j.; yang, n.; deng, j.; liu, k.; yang, p.; zhang, g.; jiang, c., inhibition of sars pseudovirus cell entry by lactoferrin binding to heparan sulfate proteoglycans. plos one , ( ), e . . mycroft-west, c.; su, d.; elli, s.; li, y.; guimond, s.; miller, g.; turnbull, j.; yates, e.; guerrini, m.; fernig, d.; lima, m.; skidmore, m., the coronavirus (sars-cov- ) surface protein (spike) s receptor binding domain undergoes conformational change upon heparin binding. biorxiv , . . . . . kim, s. y.; jin, w.; sood, a.; montgomery, d. w.; grant, o. c.; fuster, m. m.; fu, l.; dordick, j. s.; woods, r. j.; zhang, f.; linhardt, r. j., glycosaminoglycan binding (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . motif at s /s proteolytic cleavage site on spike glycoprotein may facilitate novel coronavirus (sars-cov- ) host cell entry. biorxiv , . . . . . tang, t.; bidon, m.; jaimes, j. a.; whittaker, g. r.; daniel, s., coronavirus membrane fusion mechanism offers a potential target for antiviral development. antiviral res. , , . . partridge, l. j.; urwin, l.; nicklin, m. j. h.; james, d. c.; green, l. r.; monk, p. n., ace -independent interaction of sars-cov- spike protein to human epithelial cells can be inhibited by unfractionated heparin. biorxiv , . . . . . guimond, s. e.; mycroft-west, c. j.; gandhi, n. s.; tree, j. a.; buttigieg, k. r.; coombes, n.; nystrom, k.; said, j.; setoh, y. x.; amarilla, a.; modhiran, n.; julian sng, d. j.; chhabra, m.; watterson, d.; young, p. r.; khromykh, a. a.; lima, m. a.; fernig, d. g.; su, d.; yates, e. a.; hammond, e.; dredge, k.; carroll, m. w.; trybala, e.; bergstrom, t.; ferro, v.; skidmore, m. a.; turnbull, j. e., pixatimod (pg ), a clinical- stage heparan sulfate mimetic, is a potent inhibitor of the sars-cov- virus. biorxiv , . . . . . mycroft-west, c. j.; su, d.; pagani, i.; rudd, t. r.; elli, s.; guimond, s. e.; miller, g.; meneghetti, m. c. z.; nader, h. b.; li, y.; nunes, q. m.; procter, p.; mancini, n.; clementi, m.; bisio, a.; forsyth, n. r.; turnbull, j. e.; guerrini, m.; fernig, d. g.; vicenzi, e.; yates, e. a.; lima, m. a.; skidmore, m. a., heparin inhibits cellular invasion by sars-cov- : structural dependence of the interaction of the surface protein (spike) s receptor binding domain with heparin. biorxiv , . . . . . clausen, t. m.; sandoval, d. r.; spliid, c. b.; pihl, j.; perrett, h. r.; painter, c. d.; narayanan, a.; majowicz, s. a.; kwong, e. m.; mcvicar, r. n.; thacker, b. e.; glass, c. a.; yang, z.; torres, j. l.; golden, g. j.; bartels, p. l.; porell, r. n.; garretson, a. f.; laubach, l.; feldman, j.; yin, x.; pu, y.; hauser, b. m.; caradonna, t. m.; kellman, b. p.; martino, c.; gordts, p. l. s. m.; chanda, s. k.; schmidt, a. g.; godula, k.; leibel, s. l.; jose, j.; corbett, k. d.; ward, a. b.; carlin, a. f.; esko, j. d., sars-cov- infection depends on cellular heparan sulfate and ace . cell , ( ), - .e . . bishop, j. r.; schuksz, m.; esko, j. d., heparan sulphate proteoglycans fine-tune mammalian physiology. nature , ( ), - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . cagno, v.; tseligka, e. d.; jones, s. t.; tapparel, c., heparan sulfate proteoglycans and viral attachment: true receptors or adaptation bias? viruses , ( ), . . de haan, c. a. m.; haijema, b. j.; schellen, p.; wichgers schreur, p.; te lintelo, e.; vennema, h.; rottier, p. j. m., cleavage of group coronavirus spike proteins: how furin cleavage is traded off against heparan sulfate binding upon cell culture adaptation. j. virol. , ( ), - . . de haan, c. a. m.; li, z.; te lintelo, e.; bosch, b. j.; haijema, b. j.; rottier, p. j. m., murine coronavirus with an extended host range uses heparan sulfate as an entry receptor. j. virol. , ( ), - . . sarrazin, s.; lamanna, w. c.; esko, j. d., heparan sulfate proteoglycans. cold spring harb. perspect. biol. , ( ), a . . xu, d.; esko, j. d., demystifying heparan sulfate–protein interactions. annu. rev. biochem , ( ), - . . kamhi, e.; joo, e. j.; dordick, j. s.; linhardt, r. j., glycosaminoglycans in infectious disease. biol. rev. , ( ), - . . garcía, b.; merayo-lloves, j.; martin, c.; alcalde, i.; quirós, l. m.; vazquez, f., surface proteoglycans as mediators in bacterial pathogens infections. front. microbiol. , , . . zong, c.; venot, a.; li, x.; lu, w.; xiao, w.; wilkes, j.-s. l.; salanga, c. l.; handel, t. m.; wang, l.; wolfert, m. a.; boons, g.-j., heparan sulfate microarray reveals that heparan sulfate–protein binding exhibits different ligand requirements. j. am. chem. soc. , ( ), - . . arungundram, s.; al-mafraji, k.; asong, j.; leach, f. e.; amster, i. j.; venot, a.; turnbull, j. e.; boons, g.-j., modular synthesis of heparan sulfate oligosaccharides for structure−activity relationship studies. j. am. chem. soc. , ( ), - . . stadlbauer, d.; amanat, f.; chromikova, v.; jiang, k.; strohmeier, s.; arunkumar, g. a.; tan, j.; bhavsar, d.; capuano, c.; kirkpatrick, e.; meade, p.; brito, r. n.; teo, c.; mcmahon, m.; simon, v.; krammer, f. sars-cov- seroconversion in humans: a detailed protocol for a serological assay, antigen production, and test setup. curr. protoc. microbiol. , ( ), e .. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . amanat, f.; stadlbauer, d.; strohmeier, s.; nguyen, t. h. o.; chromikova, v.; mcmahon, m.; jiang, k.; arunkumar, g. a.; jurczyszak, d.; polanco, j.; bermudez- gonzalez, m.; kleiner, g.; aydillo, t.; miorin, l.; fierer, d. s.; lugo, l. a.; kojic, e. m.; stoever, j.; liu, s. t. h.; cunningham-rundles, c.; felgner, p. l.; moran, t.; garcia- sastre, a.; caplivski, d.; cheng, a. c.; kedzierska, k.; vapalahti, o.; hepojoki, j. m.; simon, v.; krammer, f. a serological assay to detect sars-cov- seroconversion in humans. nat. med. , ( ), - . . kim, s. y.; jin, w.; sood, a.; montgomery, d. w.; grant, o. c.; fuster, m. m.; fu, l.; dordick, j. s.; woods, r. j.; zhang, f.; linhardt, r. j., characterization of heparin and severe acute respiratory syndrome-related coronavirus (sars-cov- ) spike glycoprotein binding interactions. antiviral res. , , . . shang, j.; ye, g.; shi, k.; wan, y.; luo, c.; aihara, h.; geng, q.; auerbach, a.; li, f., structural basis of receptor recognition by sars-cov- . nature , ( ), - . . xia, s.; lan, q.; su, s.; wang, x.; xu, w.; liu, z.; zhu, y.; wang, q.; lu, l.; jiang, s., the role of furin cleavage site in sars-cov- spike protein-mediated membrane fusion in the presence or absence of trypsin. signal. transduct. target. ther. , ( ), . . bouwman, k. m.; tomris, i.; turner, h. l.; van der woude, r.; bosman, g. p.; rockx, b.; herfst, s.; haagmans, b. l.; ward, a. b.; boons, g.-j.; de vries, r. p., multimerization- and glycosylation-dependent receptor binding of sars-cov- spike proteins. biorxiv , . . . . . kim, y.-i.; kim, s.-g.; kim, s.-m.; kim, e.-h.; park, s.-j.; yu, k.-m.; chang, j.- h.; kim, e. j.; lee, s.; casel, m. a. b.; um, j.; song, m.-s.; jeong, h. w.; lai, v. d.; kim, y.; chin, b. s.; park, j.-s.; chung, k.-h.; foo, s.-s.; poo, h.; mo, i.-p.; lee, o.-j.; webby, r. j.; jung, j. u.; choi, y. k., infection and rapid transmission of sars-cov- in ferrets. cell host microbe , ( ), - .e . . richard, m.; kok, a.; de meulder, d.; bestebroer, t. m.; lamers, m. m.; okba, n. m. a.; fentener van vlissingen, m.; rockx, b.; haagmans, b. l.; koopmans, m. p. g.; fouchier, r. a. m.; herfst, s., sars-cov- is transmitted via contact and via the air between ferrets. nat. comm. , ( ), . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . oreshkova, n.; molenaar, r. j.; vreman, s.; harders, f.; oude munnink, b. b.; hakze-van der honing, r. w.; gerhards, n.; tolsma, p.; bouwstra, r.; sikkema, r. s.; tacken, m. g.; de rooij, m. m.; weesendorp, e.; engelsma, m. y.; bruschke, c. j.; smit, l. a.; koopmans, m.; van der poel, w. h.; stegeman, a., sars-cov- infection in farmed minks, the netherlands, april and may . eurosurveillance , ( ), . . rabenstein, d. l., heparin and heparan sulfate: structure and function. nat. prod. rep. , ( ), - . . smits, n. c.; kurup, s.; rops, a. l.; ten dam, g. b.; massuger, l. f.; hafmans, t.; turnbull, j. e.; spillmann, d.; li, j.-p.; kennel, s. j.; wall, j. s.; shworak, n. w.; dekhuijzen, p. n. r.; van der vlag, j.; van kuppevelt, t. h., the heparan sulfate motif (glcns s-idoa s) , common in heparin, has a strict topography and is involved in cell behavior and disease. j. biol. chem. , ( ), - . . sardu, c. g., j.; morelli, m.b.; wang, x.; marfella, r.; santulli, g. , is covid- an endothelial disease? clinical and basic evidence. preprints , . . tang, n.; bai, h.; chen, x.; gong, j.; li, d.; sun, z., anticoagulant treatment is associated with decreased mortality in severe coronavirus disease patients with coagulopathy. j. thromb. haemost. , ( ), - . . thachil, j.; tang, n.; gando, s.; falanga, a.; cattaneo, m.; levi, m.; clark, c.; iba, t., isth interim guidance on recognition and management of coagulopathy in covid- . j. thromb. haemost. , ( ), - . . thacker, b. e.; xu, d.; lawrence, r.; esko, j. d., heparan sulfate -o-sulfation: a rare modification in search of a function. matrix biol. , , - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . biochemical, structural insights of newly isolated aa family of lytic polysaccharide monooxygenase (lpmo) from aspergillus fumigatus and investigation of its synergistic effect using biomass. musaddique hossain, subba reddy dodda, bishwajit singh kapoor, kaustav aikat, and sudit s. mukhopadhyay* department of biotechnology, national institute of technology durgapur- , west bengal, india running title: biochemical, structural insights, and investigation of the synergistic effect of newly isolated aa family of lytic polysaccharide monooxygenase (lpmo) from aspergillus fumigatus. * to whom the corresponding author should be addressed. e-mail: suditmukhopadhy@yahoo.com phone: + (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract the efficient conversion of lignocellulosic biomass into fermentable sugar is a bottleneck for the cheap production of bio-ethanol. the recently identified enzyme lytic polysaccharide monooxygenase (lpmo) family has brought new hope because of its boosting capabilities of cellulose hydrolysis. in this report, we have identified and characterized a new class of auxiliary (aa ) oxidative enzyme lpmo from the genome of a locally isolated thermophilic fungus aspergillus fumigatus (nitdgpka ) and evaluated its boosting capacity of biomass hydrolysis. the aflpmo is an intronless gene and encodes the kda protein. while sequence-wise, it is close to the c type of aaaa and cellulose-active aa family of lpmos, but the predicted three-dimensional structure shows the resemblance with the aa family of lpmo (pdb id: mah). the gene was expressed under an inducible promoter (aox ) with c-terminal his tag in the pichia pastoris. the protein was purified using ni-nta affinity chromatography, and we studied the enzyme kinetics with , -dimethoxyphenol. we observed polysaccharides depolymerization activity with carboxymethyl cellulose (cmc) and phosphoric acid swollen cellulose (pasc). moreover, the simultaneous use of cellulase cocktail (commercial) and aflpmo enhances lignocellulosic biomass hydrolysis by -fold, which is highest so far reported in the lpmo family. importance the auxiliary enzymes, such as lpmos, have industrial importance. these enzymes are used in cellulolytic enzyme cocktail due to their synergistic effect along with cellulases. in our study, we have biochemically and functionally characterized the new aa family of lpmo from aspergillus fumigatus (nitdgpka ). the biochemical characterization is the fundamental scientific elucidation of the newly isolated enzyme. the functional characterization, biomass degradation activity of aflpmo , and cellulase cocktail (commercial) combination enhancing the activity by -fold. this enhancement is the highest reported so far, which gives the enzyme aflpmo enormous potential for industrial use. keywords: a.fumigatus, auxiliary activity, cloning, kinetics, lpmo, lignocelluloses, molecular docking (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction the diminution of fossil fuels and the growing concern of environmental consequences, particularly climate changes, have steered our fast-growing economy for clean and renewable energy production [ ]. among different renewable energy sources, bioethanol is one of the promising alternatives to fossil fuel because of its low co emission [ , ] and its manufacturing reliance on lignocellulosic biomass, which is bio-renewable and abundance on earth. however, the structural complexity and the recalcitrance of this renewable carbon source [ ] have hindered its optimal use. the current process of saccharification of lignocellulosic biomass is time-consuming and costly. therefore, the requirement of cost- effective and fast controlled destruction of lignocellulose has driven the bioethanol industry to explore the accessory enzymes to achieve a better and efficient enzyme cocktail for the commercial production of lignocellulose-derived ethanol. a breakthrough in such exploration came into existence when a mono-copper redox enzyme, known as lytic polysaccharide monooxygenase (lpmo), was first reported in [ - ]. lpmo increases lignocellulosic biomass conversion efficiency[ , ] by catalyzing the hydroxylation of c and/or c carbon involved in glycosidic bonds that connect glucose unit in cellulose and allow cellulase enzymes to process the destabilized complex polysaccharides [ - ]. harris et al., in their study, used lpmo from t reesei along with classical cellulases and showed that the degradation of polysaccharide substrates was increased by a factor of two when compared with the activity of classical cellulases alone [ ]. a cbm domain- containing enzyme identified from serratia marcescens with boosting chitinase activity, later classified as lpmo. a study by nakagawa et al. showed that an aa family of lpmo from streptomyces griseus could increase the efficiency of chitinase enzymes by - and -fold on both α and β forms of chitin, respectively [ ]. along with this work, there are some recent reports of the synergistic effect of lpmos with glycoside hydrolases on polysaccharide substrates [ - ]. lpmos are classified as aa , aa , aa , aa , aa , and aa in the cazy database (http://www.cazy.org/), based on their amino acid sequence similarity. recently filiatrault- chastel et al. identified the aa , a new family of lpmo from the secretome of a fungi aspergillus aculeatus (aaaa ). the aaaa was initially isolated as x protein (unnamed domain) and later identified as c -oxidizing lpmo active on cellulose [ ]. aaaa , the only aa family of lpmo so far, has been identified, and it lacks complete (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . biochemical characterization. the biochemical characterization, structural characterization, and the assessment of biomass conversion efficiency are required to understand better the action of members of this new family on plant biomass and their possible biological roles. while we were analyzing the cellulose hydrolyzing genes from the genome of a. fumigatus (aspergillus genome database), we identified five lpmos, one belonging to aa family because of its x domain. further, we cloned the aflpmo gene from the genome of our locally isolated strain of a. fumigatus (nitdgpka ) [ ] (genbank accession no. jq ) by designing the primers based on the a. fumigatus lpmo sequence (caf . )(ncbi). the cloned a. fumigatus (nitdgpka ) lpmo (after cloning and sequencing the sequence submitted to genbank; accession no. mt ) is expressed in pichia pastoris x . the heterologous protein (aflpmo ) purified and used for biochemical and functional characterization. the saccharification rate assessment suggests that aflpmo has fast and effective glucose releasing ability from lignocellulose and cellulose when used with a commercial cellulase cocktail. enzyme kinetics using , - dimethoxyphenol as a substrate [ ] confirmed the oxidative activity. the lignocellulosic biomass (alkaline pre-treated raw rice straw) conversion efficiency along with cellulases suggests that aflpmo could be an essential member of the cellulase cocktail for industrial use. results cloning, expression, and purification of aflpmo aflpmo (genbank accession no. mt ) is an intronless nucleotide long gene that encodes amino acids. the theoretical molecular mass is kda (including signal peptide). the gene sequence of aa from our isolated strain of a.fumigatus (nitdgpka ) has shown almost . % homology with the gene sequence of aa present in the genome database of a.fumigatus (caf . ) (ncbi database). the protein of aflpmo (genbank accession no. mt ) was produced in pichia pastoris x without its c-terminal extension. after the optimization of the expression procedure, we achieved approximately . mg/ml of purified protein. the sds-page analysis (fig ) confirmed the single band of the purified protein (fig. : lanes and ). we further confirmed the purified recombinant protein bearing the x his-tag by western blot using an anti-his antibody (fig. : lane w & w ); the purified protein (lane & of sds- (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . page) used for western blot. the expressed recombinant aflpmo band appeared at approximately kda position in sds-page (fig. ), which is slightly higher than the expected size. it is probably due to glycosylation [ ], or recombinant protein has c-myc epitope and x his tag in its c-terminal that can increase the molecular mass by . kda. for further confirmation of n-glycosylation, we checked the aflpmo sequence glycosylation site using netnglyc . server (dtu bioinformatics, technical university of denmark, http://www.cbs.dtu.dk/services/netnglyc/) [ ]. there were two n-glycosylation sites present above the . threshold value at & amino acid sequence positions with . and . potential values, respectively. enzyme assay and kinetics lpmo converts , -dimethoxyphenol ( , -dmp) into -coerulignone (fig. a) due to its oxidative property, and -coerulignone has an extinction coefficient of . - coerulignone gives absorbance at nm wavelength; therefore, we can easily quantify it using a spectrophotometer [ ]. the od at nm wavelength steadily increases with time that clearly indicates the steady conversion of , -dimethoxyphenol to -coerulignone (fig. a). it also suggests the sufficient activity of the enzyme aflpmo . temperature and ph influence the activity of lpmo. thus, during the kinetic study, we used optimum temperature and ph . , as described by [ ]. aflpmo showed proper activity for the chemical substrate , -dimethoxyphenol; there was a steady release of -coerulignone when incubated , -dimethoxyphenol with aflpmo . the enzyme kinetics was performed with different concentrations of , -dimethoxyphenol. we obtained the kinetics parameters such as michaelis menten constant (km) and maximum velocity (vmax) from the line- weaver-burk plot (fig. b) as . mm, and . u/mg, respectively. the calculated catalytic activity kcat was . min - (table ). these kinetics parameters suggest that the oxidative property of aflpmo . in-silico analysis for substrate specificity the aflpmo contains amino acids long n-terminal signal peptide before his catalytic domain ( - aa), and c terminal serine rich region ( - aa) (fig. a). this n-terminal sequence is one of the marker features of fungal lpmos, but this serine-rich c-terminal or linker is a feature of aa family. it also lacks the cbm module or glycosylphosphatidylinositol (gpi) anchor, like other aa lpmos [ ]. aflpmo also (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . has conserved histidine at st and th positions, which are mainly involved in copper binding, the signature characteristic of lpmos. there are other conserved sequences like gly, pro, asn, cys, try, tyr, leu, and asp, including gnv(i)qgelq motif (fig. b) the fully conserved sequences (highlighted with red background) are the marker amino acids represent the lpmos. the partially conserved sequences (within the blue boxes) are the marker of different auxiliary families (fig. b). the sequence alignment studies of aa family (including aflpmo ) with other families (aa , aa , and aa ) of lpmos suggested (fig. s ) a co-relation between aa family and aa lpmos. the substrate- binding motif in the l loop of cellulose active lpmo has some similarities with aa l loop motif (marked with black box) and cellulose active motif (fig. b). in aa lpmos the conserved motif in l loop gni(v)qgel the region is replaced by ynwfg(a)nl for c oxidizing aa lpmos, which are also cellulose active. the previous study suggests that the amino acids (y , n , f , y , and w ) in loop l take part in substrate specificity for lpmo , and mutations (y , n d, f a, y f, w q) alter the specificity of the substrate from chitin to cellulose [ ]. in aflpmo , the corresponding amino acids gnqyr (fig. b) (marked with black arrows), some amino acids from these positions (n & y) are also present in cellulose-active aa lpmos. hopefully, the polar amino acids (q & r) are charged and may interact with chitin due to electrostatic interaction. alternatively, there are high chances that few mutations in these amino acids may help aflpmo to interact with chitin. further, in chitin active lpmos, more than % residues of the motif (y(w)epqsve) are polar, including two negatively charged glu (e). in cellulose active lpmos, % residues of the motif (y(w)nwfgvl) are hydrophobic [ ]. in contrast, in aflpmo , % residues are polar, including one negatively charged glu (e), one hydrophobic tyr (y), and others are neutral. the presence of polar residue and negative charged glu (e) suggests that aflpmo may bind to chitin. electrostatics interaction between the substrate and enzyme active site plays a pivotal role in substrate binding. the electrostatic potential surface at the catalytic site of the aflpmo was found unchanged or slightly positive-charged at ph . (fig. c) (marked in the figure). the electrostatic interaction study suggests that the aflpmo may also bind to cellulose [ ]. regioselectivity of aflpmo amino acids on the substrate-binding surface determine the oxidative regioselectivity of lpmos [ ]. sequence comparison and mutation studies revealed that the conserved amino (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . acids near the catalytic center in c and c /c oxidizing aa and aa lpmos are responsible for regioselectivity. in the case of c /c oxidizing aa , the amino acid asn near the catalytic center is responsible for c oxidizing activity. alteration of this amino acid (n f) diminished the c activity and produced only c oxidized product [ ]. in c oxidizing aa lpmos, hydrophobic amino acids phe and tyr are conserved in addition to asn. while in c oxidizing aa lpmos, the phe amino acid has replaced the corresponding asn site (fig. b)(marked with red arrow). the phe is also parallel to the substrate-binding surface [ ]. in aa , the corresponding gln (q) may be parallel to the substrate-binding region (fig. b). the function of conserved gln (q) is not clear. however, this polar amino acid has a similar side chain with polar asn (n). the axial distance between the conserved amino acid and copper catalytic center is another crucial factor for regioselectivity. the c /c oxidizing aa lpmos have more open or wider axial gaps than c oxidizing aa lpmos [ ]. here the distance between gln and his is . Å, and the distance between gln and cu catalytic center is . Å. in the absence of the aa structure (crystal or model), we cannot compare the lengths; nevertheless, this distance may play a key role in regioselectivity. phylogenetic tree construction and analysis the sequential and functional relationship of aa and aa lpmos has been discussed, but phylogenetic studies based on the sequence similarity give an evolutionary origin. based on sequence comparison, aflpmo is evolutionarily closer to the lpmo of aspergillus fisheri ( % sequence homology). the constructed phylogenetic tree contains two main clades and two subclades (fig. ). the first clade contains all aa lpmos from bacterial species such as bacillus thuringiensis, bacillus amyloliquefaciens, streptomyces lividans, and enterococcus faecalis. the second clade includes all fungal aa and aa lpmos, mainly belongs to aspergillus, and penicillium species in which aa lpmos are mostly from a.niger, a.fumigatus, a.fisheri, aspergillus kawachii (fig. ). model structure prediction and molecular docking analysis i-tasser was used to predict the three-dimensional structure of the aflpmo . most of the lpmos have immunoglobulin-like distorted β-sandwich fold like structures, in which loops connect seven antiparallel β-strands with a different number of α-helix insertions (fig. a). the final model has a β-sandwich structure connected by loops with two α-helices. the (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . superimposition of the aflpmo with other lpmo families like aa , aa , aa , and aa showed that they share common antiparallel β-strands and helices with more loops, which indicate higher flexibility. moreover, aflpmo showed . Å rmsd with aa (pdb id: mah) lpmo lower than the other lpmos. so the d structure of aflpmo suggests that it has more structural resemblance with aa lpmo. we also found one disulfide bond in aflpmo between the cys -cys amino acids, signature of thermo- stability (fig. s ). the histidine brace amino acids, such as his and his , participate in coordination bond with cu ions. the surface of aflpmo has an active site (fig. b). the interaction studies with cellohexose suggest amino acids like gln , gln , ser , his , his , asn , asp , tyr , and glu (active enzyme starts with his ; so his will his and corresponding amino acids can be numbered accordingly) are in the active site and are involved in the interaction with the substrate (fig. c). molecular docking suggests that aflpmo has a cellulose-binding surface (fig. b & c). this study also suggests that the binding energy between aflpmo and cellulose is - . kcal/mol, which is highest compared to chitin (- . kcal/mol) and other polysaccharides. polysaccharides depolymerization by aflpmo aflpmo showed efficient depolymerization activity on both cmc and pasc (fig. a & b). we quantified the amount of reducing sugar released by enzymatic degradation. when incubated cmc with increasing concentrations of the enzyme, the amount of product (reducing sugar) increased with the increase of aflpmo concentration (fig. a). when we added µg of the enzyme, nearly . mg/ml of reducing sugar was released. for µg of the enzyme, the product was nearly . mg/ml, and for µg of the enzyme, the amount of product released was approximately . mg/ml (fig. a). this result indicates the polysaccharide (cmc) depolymerization activity of aflpmo . further, we used insoluble pasc as a substrate and incubated with an increasing concentration of aflpmo , and determined the relative absorbance of pasc with the growing amount of enzyme. the enzyme degrades the polysaccharide (substrate) into smaller polysaccharide units (monosaccharides, disaccharides, etc.), which are soluble and make the reaction mixture clearer. therefore, it leads to a decrease in the absorbance resulting increment in relative absorbance [ ]. ultimately we will find a graph where relative absorbance increase with increasing concentration of aflpmo . hence in this experiment, we found a rise in relative absorbance concerning the untreated substrate with a high (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . concentration of enzyme aflpmo (fig. b). the graph (fig. b) showed that . absorbance difference concerning untreated substrate when we used µl (concentration . µg/µl) of the enzyme. the difference in absorbance steadily increased with the escalation of enzyme concentration ( µl of the enzyme at the concentration of . µg/µl the relative absorbance reached nearly . ). hence these experiments confirmed the intrinsic polysaccharide degradation property of the aflpmo like other lpmos. in these experiments, we used the heat-inactivated aflpmo and ascorbic acid-deficient set to verify these results (data not shown). pre-treated lignocellulosic biomass and cellulose hydrolysis with simultaneous treatment of aflpmo and commercial cellulase there are two modes of action to show the synergy or boosting effect of lpmo while using with cellulase- sequential assay and simultaneous assay. in the sequential assay, lpmo should add a prior time limit to cellulase. and in the simultaneous assay, both the enzymes lpmo and cellulase are being used together to the substrate. in this study, we chose to perform a simultaneous assay for two reasons; simultaneous assay shows better synergy or boosting in crystalline cellulose [ ] than sequential one. furthermore, we aimed to check the synergy or stimulating activity of commercial cellulase by aflpmo so that it may include in the cocktail for better depolymerizing action. here the boosting effect of aflpmo was studied with a commercial cellulase cocktail on both cellulose (avicel) and lignocellulosic biomass (alkaline pre-treated rice straw). the alkaline pre-treatment has a beneficiary over acid pre-treatment in terms of hydrolysis yield [ ]. the reason is that alkaline pre-treatment sufficiently removes the lignin [ ], but it preserves hemicelluloses [ ]. when incubating avicel with aflpmo and cellulase, the amount of reducing sugar released was almost double compared to avicel incubated with either cellulase alone or cellulase along with heat- inactivated aflpmo (fig. b). a similar kind of boosting effect we observed in every time point from hrs to hrs. we also found the synergistic impact of aflpmo in lignocellulosic biomass transformation to fermentable sugar (fig. a). when we incubated the alkaline pre-treated rice straw with µg and µg of aflpmo along with cellulase, almost . fold and slightly above -fold of reducing sugar were released respectively compared to lignocellulose incubated with either cellulase alone or cellulase along with heat- inactivated aflpmo (fig. a) suggests the enhancement is dependent on the amount of auxiliary enzyme aflpmo . for further elaboration of the synergistic effect of aflpmo , (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . another set of reactions prepared where the biomass was treated with an increasing concentration of only aflpmo . a minimal amount of hydrolysis activity was there, nearly . mg/ml to . mg/ml, reducing sugar quantified for aflpmo treated biomass (fig. c). this hydrolysis activity of aflpmo alone is negligible compare to only cellulase treated biomass. nevertheless, the simultaneous use of aflpmo and cellulase enhances the hydrolysis activity two-fold compared to the only cellulase treated biomass (fig. c). this result strongly indicates the synergistic effect of aflpmo with cellulase. all these results confirmed the boosting effect or synergistic effect of aflpmo on the hydrolytic activity of cellulase for both cellulosic and lignocellulosic biomass degradation. so far highest synergistic effect was reported by aa (table ), which is less than two-fold [ , ]. discussion the gene was cloned in ppiczαa vector under the control of aox promoter by following the same strategy developed for aaaa and pmo a_mlaci [ , ]. the nucleotide sequence of aflpmo was codon-optimized for pichia pastoris. the recombinant protein containing a c-terminal polyhistidine tag was produced in flasks in the presence of trace metals, including copper, and purified from the culture supernatant by immobilized metal ion affinity chromatography (imac: ni-nta affinity chromatography), following the same protocol used for aaaa [ ]. we were successful in producing the active aflpmo in p.pastoris x (fig. ) in a shake flask. despite the chance of n-terminal modification in shake flask culture instead of bioreactor culture [ ], the amount of active enzyme obtained in shake flask was sufficient for characterization. the enzyme activity determined by , - dimethoxyphenol concerning the heat-inactivated enzyme and without ascorbic acid as negative controls (data not shown). the enzyme activity suggests the successful production of active protein (fig. a), and interestingly, the initial reaction rate is faster compared to later time span. lytic polysaccharide monooxygenase (lpmo) releases a spectrum of cleavage products from their polymeric substrates cellulose, hemicellulose, or chitin. the correct identification and quantitation of these released products is the basis of ms/hplc-based detection methods for lpmo activity, which is time taking and is required specialized laboratories to measure lpmo activity in day-to-day work. a spectrophotometric assay based on the , -dimethoxyphenol can accurately measure the enzymatic action and can be used for enzyme screening, production, and purification, and can also be applied to study (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . enzyme kinetics [ ]. thus it is swift, robust for biochemical characterization, and also accurately determines the active enzyme. sequence analyses indicating that the aflpmo has some signature characteristics for both cellulose and chitin-binding and both c and c /c oxidizing activity. however, experimental confirmation is required to establish the presence or absence of any chitin- binding nature and c /c oxidizing capability of aflpmo . the constructed phylogenetic tree (fig. ) suggests that the fungal aa and aa lpmos are more likely to come from a common ancestor. molecular docking study suggests that aflpmo has the highest affinity towards cellulose among the known substrates, based on the binding energy. the binding energy between cellulose and aflpmo is - . kcal/mol, which makes thermodynamically strong binding between enzyme and substrate (fig. b & c) compared to other substrates. the lpmos are essential for their auxiliary activity and polysaccharide degrading property. we observed polysaccharide depolymerizing activity on carboxymethyl cellulose (cmc) and phosphoric acid swollen cellulose (pasc) (fig. a & b). due to its auxiliary activity, it enhances the action of the cellulase enzyme for the degradation of cellulose and lignocelluloses [ ]. the only identified aa family, the aaaa , showed a sequential boosting effect with t. reesei cbhi on nano-fibrillated cellulose (nfc) and pasc. the aaaa , the recent addition of the aa family of lpmo in the cazy database, showed synergism with the cbh for the degradation of cellulose [ ]. however, aaaa study did not deal with the biomass hydrolysis boosting effect of the aa family. the boosting result is most important in the technical aspect for enhancing the activity of the cellulase cocktail. lpmo enzyme has earned much research interest due to their synergistic effect or boosting effect on cellulase enzyme [ ]. aflpmo showed a boosting impact on cellulose and lignocellulose hydrolysis (fig. a & b). the synergism of aflpmo has shown in (fig. c), where the only aflpmo and only cellulase treated biomass hydrolysis activity is low compare to the combined effect of these two enzymes. the simultaneous use of aflpmo and cellulase enhances nearly two-fold biomass hydrolysis compare to the only cellulase treated biomass hydrolysis. this enhancement of two-fold biomass hydrolysis is higher than that of other lpmo families [ ]. however, the synergy or boosting effect depends on many factors such as pre-treatment [ ], the lignin content of lignocelluloses and acting cellulase [ ]. still, over % enhancement suggests intense demands on inclusion on cellulase cocktail. however, the mechanism of synergism with the cellulase enzyme complex is poorly understood. the probable explanation of such a boosting effect could be that the cellulosic (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . biomass is partially depolymerized by the lpmo, which gives further access to the cellulase enzymes. conclusion in concluding remark, aflpmo is the second report of the aa family of lpmo, but for the first time, we have characterized the aa family biochemically and structurally. in- silico sequence analysis, structure analysis, and molecular docking studies suggest some unique characteristics of the aflpmo , like cellulose-binding ability, chances of chitin- binding, and c and c oxidizing property. further studies, including the engineering approach, are required to confirm these characteristics. nevertheless, the most crucial aspect of aflpmo is the significant boosting effect on commercial cellulase cocktail in lignocellulosic biomass conversion, and that suggests its importance in the bioethanol industry. materials and methods sequence analysis and phylogenetic analysis: aflpmo sequence (caf . ) was obtained from ncbi, and the sequence was further confirmed from the aspergillus genome database (http://www.aspgd.org/). to avoid interference from the presence or the absence of additional residues or domains, the signal peptides, and c-terminal extensions were removed before the alignment. homology sequence alignment was performed by the blast [ ]. clustal omega [ ] was used for multiple sequence alignment. the sequence alignment was edited with espript for better visualization. pymol [ ] and mega [ ] were used to construct a phylogenetic tree after sequence alignment. to build the phylogenetic tree, the sequences of twenty-seven ( ) lpmo genes (edited to remove n-terminal signal sequence, c-terminal extension or gpi anchor, cbm module) were taken from different species belong to aa and aa family of lpmos. the neighbor-joining tree was constructed with bootstrap replications. cloning of aflpmo aspergillus fumigatus nitdgpka was grown on cmc agar media containing % cmc, . % peptone, % agar in basal medium ( . % nano , . %kcl, . %mgso , . %feso , . %k hpo ). the fungal biomass was then milled in a pestle and mortar followed by rapid overtaxing in solution with an appropriate lysis buffer for proper lysis of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . the cell. genomic dna was isolated from the fungal biomass using the dna extraction buffer ( mm tris-hcl, mm nacl, . m edta, %sds) and followed by phenol, chloroform and isoamyl alcohol ( : : ) extraction. the final pellet was washed with % alcohol, air-dried, and dissolved in sterile water. aflpmo gene was amplified by polymerase chain reaction (pcr). the codon-optimized gene for pichia pastoris was inserted into the ppiczαa vector (invitrogen carlsbad, california, usa). the gene was cloned with the native signal sequence and x his-tag at the c-terminal [ ]. the cloning was done by following the same protocol as aaaa and pmo a_mlaci [ , ]. the vector (ppiczαa) containing the aflpmo gene was linearized by pme (new england biolabs) and transformed to pichia pastoris x competent cells. the zeocin resistant transformants were picked and screened for protein production. the cloned gene was further confirmed by sequencing and the sequence submitted to genbank (genbank accession no. mt ). expression and purification of aflpmo the positive colonies were selected on ypds (zeocin: μg/ml) plates. the positive transformants were further screened by the colony pcr and expression studies. protein expression was carried out initially in bmgy media containing ml/l pichia trace minerals (ptm ) salt ( g/l cuso · h o, g/l mnso ·h o, . g/l na moo · h o, . g/l h bo , . g/l caso · h o, . g/l cocl , . g/l znso · h o, g/l feso · h o, nai . g/l, h so ml/l) and . g/l of biotin. then after hours, pichia cells were transferred into bmmy medium (ptm salt) with continuous induction by the addition of % methanol (optimized) every day (after every hours) for three days. after three days, the culture media was spun down ( , rpm for mins) at c. the pellet was discarded, and the media was collected. the protein was precipitated from the media by ammonium sulfate precipitation ( % saturation). the pellet was redissolved in tris buffer (tris-hcl mm ph- . , nacl- mm, imidazole- mm). the recombinant protein was purified by immobilized ion affinity chromatography (ni-nta affinity chromatography)[ ], followed by dialysis with mm phosphate buffer, ph . . we followed the expression and purification procedure, same as aaaa [ ]. the yield of the purified protein was almost . mg/ml. the concentration was measured by bradford assay, and bsa was used for standard concentration. the protein was separated by sds-page using % acrylamide in resolving gel(dh o- . ml, acrylamide+bisacrylamide – . ml, . m tris- . ml, %sds- . ml, % aps- . ml, temed- . ml; for ml), stained with coomassie blue, and the purified protein band was also confirmed by western blot analysis by using an (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . anti-his antibody (abcam). biochemical assays of aflpmo biochemical characterization of aflpmo , dmp ( , -dimethoxyphenol) was used as a substrate for aflpmo in this study. the reaction was done in phosphate buffer ( mm ph . ) containing mm , - dimethoxyphenol, μm hydrogen peroxide, and μg of purified aflpmo at �c. the amount of product -coerulignone was measured by spectrophotometer using the standard extinction coefficient ( m- cm- ) and lambert-beer law. for kinetic assay different , - dimethoxyphenol concentrations ( mm, mm, mm, mm, mm, mm, mm, mm, mm and mm) were used. the kinetic parameters were calculated based on the line-weaver-burk plot (lb plot). one unit of enzyme activity is defined as the amount of enzyme which releases μm of -coerulignone (product) per minute in standard reaction condition. polysaccharides depolymerization by aflpmo different cellulosic compounds such as pasc, avicel®ph- (sigma), and carboxyl methylcellulose (cmc) was used. we used % avicel®ph- (sigma) (crystalline cellulose) and % cmc (carboxyl methylcellulose sodium salt) with different concentrations of purified aflpmo for different incubation time. reducing sugar was determined by dinitro salicylic acid (dns) assay. for pasc assay, we used . % pasc and incubated with increasing concentration of aflpmo for hours and measured the od after hrs of incubation and plot the relative absorbance ([od of aflpmo treated pasc]-[od of untreated substrate]) with enzyme concentration [ ]. biomass and cellulose hydrolysis by cellulase and aflpmo cellulose and lignocellulose (alkaline pre-treated raw rice straw) [ ] was used to determine the cellulose hydrolysis enhancing capacity. rice straw was pre-treated with % naoh ( : w/v ratio) at �c at psi pressure for hour, and sodium azide ( %) μl (per ml) was added at the reaction mixture to prevent any microbial contamination. the reaction was performed at �c, and the amount of reducing sugar was quantified after hours, hours, hours, and hours by dinitro salicylic acid (dns) assay. μl of cellulase (commercial) (mp biomedicals llc) ( mg/ml) was used along with two different concentrations of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . aflpmo μl ( μg) and μl ( μg) [concentration . mg/ml]. reaction sets were prepared using the only cellulase, only aflpmo with different concentrations, combined aflpmo and cellulase and lastly, cellulase with inactivated aflpmo . aflpmo was heat-inactivated by keeping at �c temperature for minutes. reducing sugar from each triplicate sets were quantified. in the case of cellulose degradation, μl ( %) of avicel (sigma) was incubated with μl of cellulase (commercial) (mp biomedicals llc) ( mg/ml). reducing sugar was quantified after hours of incubation. for these biochemical assays, we used mm phosphate buffer (ph- . ), and heat-inactivated aflpmo was taken as a negative control. molecular modeling and molecular docking i-tasser [ ] server was used to model the aflpmo . the final model was energy minimized by gromacs software [ ]. the ramachandran plot [ ] and procheck [ ] was used to evaluate the final model. for metal ion-binding site prediction and docking server or mib server (http://bioinfo.cmu.edu.tw/mib/) were used to identify the copper (cu) ion position. a molecular docking study was performed by the autodock vina [ ] using mgl tools (molecular graphics laboratory). the optimized substrate structures were prepared by autodock vina and saved in pdbqt format. the grid size parameters used in this docking were , , , and grid center parameters used in this study were , , and . the genetic algorithm was also used for docking. molecular interactions between enzyme and substrate were analyzed by the mgl tools [ ]. the electrostatic potential surface of the aflpmo is calculated by the apbs plugin available in pymol at ph . . acknowledgments mh is thankful to dbt, and srd is grateful to dst inspire for their fellowship. the authors are also thankful to dst-fist grant of the department of biotechnology, nit durgapur. funding this study is financially supported by the dbt, govt. of india (grant no. bt/pr / pbd/ / / ). authors’ contribution (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . mh and sm designed the research work. mh, bsk, and sm wrote the manuscript. mh performed biochemical assays. srd performed in-silico analysis. mh and ka analyzed the results. all authors read and approved the manuscript. conflict of interest authors have no competing interests. the manuscript has been spell-checked, grammar checked and plagiarism-checked by “grammarly.” ethical approval no human participants or animal is being used during the study. references . dias de oliveira me, vaughan be, rykiel ej ( ) ethanol as fuel: energy, carbon dioxide balances, and ecological footprint. bioscience. https://doi.org/ . / - ( ) [ :eafecd] . .co; . saricks c, santini d, wang m ( ) effects of fuel ethanol use on fuel-cycle energy and greenhouse gas emissions . x. lang, d. g. macdonald, g. a. hil ( ) recycle bioreactor for bioethanol production from wheat starch ii. fermentation and economics. energy sources. https://doi.org/ . / . somerville c, bauer s, brininstool g, et al ( ) toward a systems approach to understanding plant cell walls. science ( -. ). . forsberg z, vaaje-kolstad g, westereng b, et al ( ) cleavage of cellulose by a cbm protein. protein sci. https://doi.org/ . /pro. . phillips cm, beeson wt, cate jh, marletta ma ( ) cellobiose dehydrogenase and a copper-dependent polysaccharide monooxygenase potentiate cellulose degradation by neurospora crassa. acs chem biol. https://doi.org/ . /cb . quinlan rj, sweeney md, lo leggio l, et al ( ) insights into the oxidative degradation of cellulose by a copper metalloenzyme that exploits biomass components. proc natl acad sci. https://doi.org/ . /pnas. . johansen ks ( ) discovery and industrial applications of lytic polysaccharide monooxygenases. biochem soc trans. https://doi.org/ . /bst (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . . beeson wt, vu v v., span ea, et al. ( ) cellulose degradation by polysaccharide monooxygenases. annu rev biochem. https://doi.org/ . /annurev-biochem- - . vermaas j v., crowley mf, beckham gt, payne cm ( ) effects of lytic polysaccharide monooxygenase oxidation on cellulose structure and binding of oxidized cellulose oligomers to cellulases. j phys chem b. https://doi.org/ . /acs.jpcb. b . forsberg z, vaaje-kolstad g, westereng b, et al ( ) cleavage of cellulose by a cbm protein. protein sci. https://doi.org/ . /pro. . vermaas j v., crowley mf, beckham gt, payne cm ( ) effects of lytic polysaccharide monooxygenase oxidation on cellulose structure and binding of oxidized cellulose oligomers to cellulases. j phys chem b. https://doi.org/ . /acs.jpcb. b . harris p v., welner d, mcfarland kc, et al ( ) stimulation of lignocellulosic biomass hydrolysis by proteins of glycoside hydrolase family : structure and function of a large, enigmatic family. biochemistry. https://doi.org/ . /bi p . nakagawa ys, kudo m, loose jsm, et al ( ) a small lytic polysaccharide monooxygenase from streptomyces griseus targeting α- and β-chitin. febs j. https://doi.org/ . /febs. . crouch li, labourel a, walton ph, et al ( ) the contribution of non-catalytic carbohydrate-binding modules to the activity of lytic polysaccharide monooxygenases. j biol chem. https://doi.org/ . /jbc.m . . chabbert b, habrant a, herbaut m, et al ( ) action of lytic polysaccharide monooxygenase on plant tissue is governed by cellular type. sci rep. https://doi.org/ . /s - - - . liu b, krishnaswamyreddy s, muraleedharan mn, et al ( b) side-by-side biochemical comparison of two lytic polysaccharide monooxygenases from the white- rot fungus heterobasidion irregulare on their activity against crystalline cellulose and glucomannan. plos one. https://doi.org/ . /journal.pone. . filiatrault-chastel c, navarro d, haon m, et al ( ) aa , a new lytic polysaccharide monooxygenase family identified in fungal secretomes. biotechnol biofuels. https://doi.org/ . /s - - -y (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . sarkar n, aikat k ( ) aspergillus fumigatus nitdgpka provides for increased cellulase production. int j chem eng :. https://doi.org/ . / / . breslmayr e, hanžek m, hanrahan a, et al ( ) a fast and sensitive activity assay for lytic polysaccharide monooxygenase. biotechnol biofuels. https://doi.org/ . /s - - - . altschul sf, gish w, miller w, et al ( ) basic local alignment search tool. j mol biol. https://doi.org/ . /s - ( ) - . sievers f, higgins dg ( ) clustal omega, accurate alignment of very large numbers of sequences. methods mol biol. https://doi.org/ . / - - - - _ . delano w. . ( ) pymol: an open-source molecular graphics tool. ccp newsl protein crystallogr . kumar s, stecher g, tamura k ( ) mega : molecular evolutionary genetics analysis version . for bigger datasets. mol biol evol. https://doi.org/ . /molbev/msw . basotra n, dhiman ss, agrawal d, et al ( ) characterization of a novel lytic polysaccharide monooxygenase from malbranchea cinnamomea exhibiting dual catalytic behavior. carbohydr res. https://doi.org/ . /j.carres. . . . bennati-granier c, garajova s, champion c, et al ( ) substrate specificity and regioselectivity of fungal aa lytic polysaccharide monooxygenases secreted by podospora anserina to cite this version�: substrate specificity and regioselectivity of fungal aa lytic polysaccharide monooxygenases secreted by pod. biotechnol biofuels. https://doi.org/ . /s - - - . hansson h, karkehabadi s, mikkelsen n, et al ( ) high-resolution structure of a lytic polysaccharide monooxygenase from hypocrea jecorina reveals a predicted linker as an integral part of the catalytic domain. j biol chem : – . https://doi.org/ . /jbc.m . . yoswathana ( ) bioethanol production from rice straw. energy res j : – . https://doi.org/ . /erjsp. . . . zhang r, liu y, zhang y, et al ( ) identification of a thermostable fungal lytic polysaccharide monooxygenase and evaluation of its effect on lignocellulosic degradation. appl microbiol biotechnol : – . https://doi.org/ . /s - - - . pronk s, páll s, schulz r, et al ( ) gromacs . : a high-throughput and highly (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . parallel open source molecular simulation toolkit. bioinformatics. https://doi.org/ . /bioinformatics/btt . gopalakrishnan k, sowmiya g, sheik ss, sekar k ( ) ramachandran plot on the web ( . ). protein pept lett . laskowski ra, macarthur mw, moss ds, thornton jm ( ) procheck: a program to check the stereochemical quality of protein structures. j appl crystallogr. https://doi.org/ . /s . trott o, olson aj ( ) software news and update autodock vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. j comput chem. https://doi.org/ . /jcc. . morris gm, huey r, lindstrom w, et al ( ) autodock and autodocktools : automated docking with selective receptor flexibility. j comp chem. https://doi.org/ . /jcc. . agrawal, d., kaur, b., kaur brar, k., chadha, b.s., . an innovative approach of priming lignocellulosics with lytic polysaccharide monooxygenases prior to saccharification with glycosyl hydrolases can economize the second-generation ethanol process.bioresour. technol. , . https://doi.org/ . /j.biortech. . . jensen ms, klinkenberg g, bissaro b, et al ( ) engineering chitinolytic activity into a cellulose-active lytic polysaccharide monooxygenase provides insights into substrate specificity. j biol chem. https://doi.org/ . /jbc.ra . . zhou x, zhu h ( ) current understanding of substrate specificity and regioselectivity of lpmos. bioresour bioprocess :. https://doi.org/ . /s - - - . forsberg z, mackenzie ak, sørlie m, et al ( ) structural and functional characterization of a conserved pair of bacterial cellulose-oxidizing lytic polysaccharide monooxygenases. proc natl acad sci u s a. https://doi.org/ . /pnas. . hansson h, karkehabadi s, mikkelsen n, et al ( ) high-resolution structure of a lytic polysaccharide monooxygenase from hypocrea jecorina reveals a predicted linker as an integral part of the catalytic domain. j biol chem : – . https://doi.org/ . /jbc.m . . eibinger, m., ganner, t., bubner, p., rošker, s., kracher, d., haltrich, d., ludwig, r., plank, h., nidetzky, b., . cellulose surface degradation by a lytic polysaccharide monooxygenase and its effect on cellulase hydrolytic efficiency. j. biol. chem. https://doi.org/ . /jbc.m . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . kim, i.j., youn, h.j., kim, k.h., . synergism of an auxiliary activity (aa ) from chaetomium globosum with xylanase on the hydrolysis of xylan and lignocellulose. process biochem. , – . https://doi.org/ . /j.procbio. . . . kim, i.j., jung, j.y., lee, h.j., park, h.s., jung, y.h., park, k., kim, k.h., . customized optimization of cellulase mixtures for differently pre-treated rice straw. bioprocess biosyst. eng. , – . https://doi.org/ . /s - - - . zhang r, liu y, zhang y, et al ( ) identification of a thermostable fungal lytic polysaccharide monooxygenase and evaluation of its effect on lignocellulosic degradation. appl microbiol biotechnol : – . https://doi.org/ . /s - - - . hemsworth, g.r., johnston, e.m., davies, g.j., walton, p.h., . lytic polysaccharide monooxygenases in biomass conversion. trends biotechnol. xx, – . https://doi.org/ . /j.tibtech. . . . dimarogona, m., topakas, e., olsson, l., christakopoulos, p., . bioresource technology lignin boosts the cellulase performance of a gh- enzyme from sporotrichum thermophile. bioresour. technol. , – . https://doi.org/ . /j.biortech. . . . liu b, kognole aa, wu m, et al ( a) structural and molecular dynamics studies of a c -oxidizing lytic polysaccharide monooxygenase from heterobasidion irregulare reveal amino acids important for substrate recognition. : – . https://doi.org/ . /febs. . kim, i.j., youn, h.j., kim, k.h., . synergism of an auxiliary activity (aa ) from chaetomium globosum with xylanase on the hydrolysis of xylan and lignocellulose. process biochem. , – . https://doi.org/ . /j.procbio. . . . corrêa tlr, júnior at, wolf ld, et al ( ) an actinobacteria lytic polysaccharide monooxygenase acts on both cellulose and xylan to boost biomass saccharification. biotechnol biofuels : – . https://doi.org/ . /s - - - . hu j, tian d, renneckar s, saddler jn ( ) enzyme mediated nanofibrillation of cellulose by the synergistic actions of an endoglucanase, lytic polysaccharide monooxygenase (lpmo) and xylanase. sci rep : – . https://doi.org/ . /s - - - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . müller g, várnai a, johansen ks, et al ( ) harnessing the potential of lpmo-containing cellulase cocktails poses new demands on processing conditions. biotechnol biofuels. https://doi.org/ . /s - - -y . pantoom s, songsiriritthigul c, suginta w ( ) the effects of the surface-exposed residues on the binding and hydrolytic activities of vibrio carchariae chitinase a. bmc biochem : – . https://doi.org/ . / - - - figure legends figure expression and purification of aflpmo (marked with red arrow). sds page analysis; lane , flow-through, lane , & wash, lane & . purified aflpmo : western blot analysis using purified protein presented in lane & of sds page marked as lane w and w figure enzyme kinetics studies of aflpmo with , -dmp (mean values are plotted). (a) chemical reaction to convert , dmp to -coerulignone; od at nm vs. time plot. (b) lb plot or /v vs /[s] plot. figure in silico analysis of aflpmo . (a) schematic diagram of aflpmo ; signal peptide: amino acids, catalytic domain: - amino acids, and a serine-rich domain: - amino acids. (b) multiple sequence alignment of aa lpmos, c oxidizing, and c /c oxidizing aa lpmos: conserved sequences are highlighted. the red arrow indicates the amino acid responsible for regioselectivity; the black arrow represents the amino acid responsible for substrate specificity, the black box represents the aa conserved motif. (c) the electrostatic surface potential of aflpmo model structure at ph . , blue and red color represents positive and negative potential surface respectively. the area surrounded by the ring represents the catalytic site. figure phylogenetic relationship of aflpmo with aa lpmos. a neighbor-joining tree from mega showing c (bacterial) & c (fungal) clades and c clade further divided into c . ( penicillium & other ) & c . (aspergillus) subclades. figure model structure and molecular docking of aflpmo . (a) predicted three- dimensional models of the aflpmo showing functional loops ls(orange), l (blue), (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . l (green), lc(magenta) loops surrounding the copper active site. (b) histidine brace (his , his ) of aflpmo surrounding the copper metal. (c) amino acids involved in substrate binding: gln , gln , ser , his , his , asn , asp , tyr , glu figure polysaccharides degradation activity of aflpmo . (a) cmc depolymerization: estimation of reducing sugar with the increasing amount of aflpmo . (b) pasc hydrolysis: relative absorbance at nm vs. aflpmo quantity plot. results are the mean value of the minimum three experiments. the bar represents the standard deviation (sd) figure boosting effect of aflpmo . (a) hydrolysis of alkali pre-treated rice straw: light- grey bar indicates only cellulase and deep-grey indicates heat inactive aflpmo with cellulase, dark-grey and black bar indicates cellulase along with two different quantity of aflpmo . (b) avicel hydrolysis: reducing sugar estimation. light-grey bar indicates only cellulase and deep-grey indicates heat inactive aflpmo with cellulase, dark-grey and black bar indicates cellulase along with two different quantities of aflpmo . (c) synergistic effect: light-grey bars indicate biomass hydrolysis by two different concentrations of aflpmo ; dark-grey bar indicating the only cellulase treated biomass and black bar indicating combined treated biomass with aflpmo & cellulase. error bars represent the standard deviation of experiments ran in triplicate. the different number of asterisks (*) indicate a significant difference between glucose release in the presence of aflpmo by one-way anova followed by student's t-test (p<< . ). enzyme kinetics parameter values vmax in u/mg . km in mm . kcat in min - . table : enzyme kinetics of afaa with , , dmp as a substrate. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . substrates (biomass) cellulases lpmos fold increase % increase references wheat straw celluclast (novozymes) stcel a (aa ) - % [ ] corn stover celluclast (novozymes) taaa % [ ] raw rice straw celluclast (novozymes) cgaa . - . - [ ] raw rice straw cellulase (mp biomedicals) aflpmo ~ % - table : lignocellulosic biomass hydrolysis enhancement by lpmos (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . structural basis for broad coronavirus neutralization structural basis for broad coronavirus neutralization maximilian m. sauer , m. alexandra tortorici , , young-jun park , alexandra c. walls , leah homad , oliver acton , john bowen , chunyan wang , xiaoli xiong $, willem de van der schueren †, joel quispe , benjamin g. hoffstrom , berend-jan bosch , andrew t. mcguire , , *, david veesler * department of biochemistry, university of washington, seattle, washington , usa. institut pasteur, unité de virologie structurale, paris, france; cnrs umr , unité de virologie structurale, paris, france. vaccine and infectious disease division, fred hutchinson cancer research center, seattle, wa virology division, department of infectious diseases and immunology, faculty of veterinary medicine, utrecht university, utrecht, the netherlands. clinical research division, fred hutchinson cancer research center, seattle, wa, usa antibody technology resource, fred hutchinson cancer research center, seattle, wa department of global health, university of washington, seattle, wa , usa. department of laboratory medicine and pathology, university of washington, seattle, wa , usa. $present address: guangzhou regenerative medicine and health - guangdong laboratory, guangzhou institutes of biomedicine and health, chinese academy of sciences, guangzhou, china †present address: bluebird bio, seattle, wa, usa *correspondence: dveesler@uw.edu, amcguire@fredhutch.org .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / three highly pathogenic β-coronaviruses crossed the animal-to-human species barrier in the past two decades: sars-cov, mers-cov and sars-cov- . sars- cov- has infected more than million people worldwide, claimed over . million lives and is responsible for the ongoing covid- pandemic. we isolated a monoclonal antibody, termed b , cross-reacting with eight β-coronavirus spike glycoproteins, including all five human-infecting β-coronaviruses, and broadly inhibiting entry of pseudotyped viruses from two coronavirus lineages. cryo- electron microscopy and x-ray crystallography characterization reveal that b binds to a conserved cryptic epitope located in the fusion machinery and indicate that antibody binding sterically interferes with spike conformational changes leading to membrane fusion. our data provide a structural framework explaining b cross-reactivity with β-coronaviruses from three lineages along with proof-of- concept for antibody-mediated broad coronavirus neutralization elicited through vaccination. this study unveils an unexpected target for next-generation structure- guided design of a pan-coronavirus vaccine. introduction four coronaviruses mainly associated with common cold-like symptoms are endemic in humans, namely oc , hku , nl and e, whereas three highly pathogenic zoonotic coronaviruses emerged in the past two decades leading to epidemics and a pandemic. severe acute respiratory syndrome coronavirus (sars-cov) was discovered in the guangdong province of china in and spread to five continents through air travel routes, infecting , people and causing deaths, with no cases reported after (drosten et al., ; ksiazek et al., ). in , middle-east respiratory syndrome coronavirus (mers-cov) emerged in the arabian peninsula, where it still circulates, and was exported to countries, infecting a total of ~ , individuals and claiming lives as of january according to the world health .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / organization(zaki et al., ). a recent study further suggested that undetected zoonotic mers-cov transmissions are currently occurring in africa(mok et al., ). a novel coronavirus, named sars-cov- , was associated with an outbreak of severe pneumonia in the hubei province of china at the end of and has since infected over million people and claimed more than . million lives worldwide during the ongoing covid- pandemic(zhou et al., ; zhu et al., b). sars-cov and sars-cov- likely originated in bats(ge et al., ; hu et al., ; li et al., ; yang et al., ; zhou et al., ) with masked palm civets and racoon dogs acting as intermediate amplifying and transmitting hosts for sars- cov(guan et al., ; kan et al., ; wang et al., ). although mers-cov was also suggested to have originated in bats, repeated zoonotic transmissions occurred from dromedary camels(haagmans et al., ; memish et al., ). the identification of numerous coronaviruses in bats, including viruses related to sars-cov- , sars-cov and mers-cov, along with evidence of spillovers of sars-cov-like viruses to humans strongly indicate that future coronavirus emergence events will continue to occur(anthony et al., ; ge et al., ; hu et al., ; li et al., ; li et al., ; menachery et al., ; menachery et al., ; wang et al., ; yang et al., ; zhou et al., ). the coronavirus spike (s) glycoprotein mediates entry into host cells and comprises two functional subunits mediating attachment to host receptors (s subunit) and membrane fusion (s subunit)(ke et al., ; kirchdoerfer et al., ; turoňová et al., ; walls et al., b; walls et al., a; walls et al., ; wrapp et al., ). as the s homotrimer is prominently exposed at the viral surface and is the main target of neutralizing antibodies (abs), it is a focus of therapeutic and vaccine design efforts(tortorici and veesler, ). we previously showed that the sars-cov- receptor-binding domain (rbd, part of the s subunit) is immunodominant, comprises multiple distinct antigenic sites, and is the target of % of the neutralizing activity present in covid- convalescent plasma(piccoli et al., ). accordingly, monoclonal abs (mabs) with potent neutralizing activity were identified against the sars-cov- , sars- cov and mers-cov rbds and shown to protect against viral challenge in vivo (alsoussi .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / et al., ; barnes et al., a; barnes et al., b; brouwer et al., ; corti et al., ; hansen et al., ; hassan et al., a; liu et al., ; piccoli et al., ; pinto et al., ; rockx et al., ; rockx et al., ; rogers et al., ; seydoux et al., ; tortorici et al., ; walls et al., ; wang et al., a; zost et al., ). the isolation of s from a recovered sars-cov individual which neutralizes sars-cov- and sars-cov through recognition of a conserved rbd epitope demonstrated that potent neutralizing mabs could inhibit β-coronaviruses belonging to different lineage b (sarbecovirus) clades (pinto et al., ). an optimized version of s is currently under evaluation in phase clinical trials in the us. whereas a few other sars-cov- cross- reactive mabs have been identified from either sars-cov convalescent sera (huo et al., ; ter meulen et al., ; wec et al., ; yuan et al., ) or immunization of transgenic mice (wang et al., a), the vast majority of sars-cov- s-specific mabs isolated exhibit narrow binding specificity and neutralization breadth. although the covid- pandemic has accelerated the development of sars- cov- vaccines at an unprecedented pace(case et al., ; corbett et al., ; folegatti et al., ; hassan et al., b; jackson et al., ; mulligan et al., ; sahin et al., ; walls et al., a; yu et al., ; zhu et al., a), worldwide deployment to achieve community protection is expected to take many more months. based on available data, it appears unlikely that infection or vaccination will provide durable pan-coronavirus protection due to the immunodominance of the rbd and waning of ab responses, leaving the human population vulnerable to the emergence of genetically distinct coronaviruses(edridge et al., ; piccoli et al., ). the availability of mabs and other reagents cross-reacting with and broadly neutralizing distantly related coronaviruses is key for pandemic preparedness to enable detection, prophylaxis and therapy against zoonotic pathogens that might emerge in the future. we report the isolation of a mab cross-reacting with the s-glycoprotein of at least eight β-coronaviruses from lineages a, b and c, including all five human-infecting β- coronaviruses. this mab, designated b , broadly inhibits entry of viral particles pseudotyped with the s glycoprotein of lineage c (mers-cov and hku ) and lineage a .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (oc ) coronaviruses, providing proof-of-concept of mab-mediated broad β-coronavirus neutralization. a cryoem structure of mers-cov s bound to b reveals that the mab recognizes a linear epitope in the stem helix within a highly dynamic region of the s fusion machinery. crystal structures of b in complex with mers-cov s, sars- cov/sars-cov- s, oc s and hku s stem helix peptides combined with binding assays reveal an unexpected binding mode to a cryptic epitope, delineate the molecular basis of cross-reactivity and rationalize observed binding affinities for distinct coronaviruses. collectively, our data indicate that b sterically interferes with s conformational changes leading to membrane fusion and identify a key target for next- generation structure-guided design of a pan-coronavirus vaccine. results isolation of a broadly neutralizing coronavirus mab to elicit cross-reactive abs targeting conserved coronavirus s epitopes, we immunized mice twice with the prefusion-stabilized mers-cov s ectodomain trimer and once with the prefusion-stabilized sars-cov s ectodomain trimer (figure a). we subsequently generated hybridomas from immunized animals and implemented a selection strategy to identify those secreting abs recognizing both mers-cov s and sars-cov s but not their respective s subunits (which are much less conserved than the s subunit(walls et al., b; walls et al., a)), the shared foldon trimerization domain or the his tag. we identified and sequenced a mab, designated b , that bound prefusion mers-cov s (lineage c) and sars-cov s (lineage b) trimers, the two immunogens used, as well as sars-cov- s (lineage b) and oc s (lineage a) trimers with nanomolar to picomolar avidities. specifically, b bound most tightly to mers-cov s (figure b), followed by oc s (with one order of magnitude lower apparent affinity, figure c) and sars-cov/sars-cov- s (with three orders of magnitude reduced apparent affinity, figure d-e). these results show that b is a broadly reactive mab recognizing at least four distinct s glycoproteins distributed across three lineages of the β-coronavirus genus. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / to evaluate the neutralization potency and breadth of b , we assessed s- mediated entry into cells of either vesicular stomatitis virus (vsv) (kaname et al., ) or murine leukemia virus (mlv) (millet and whittaker, ; walls et al., b) pseudotyped with mers-cov s, oc s, sars-cov s, sars-cov- s and hku s in the presence of varying concentrations of mab. we determined half-maximal inhibitory concentrations of . ± . µg/ml, . ± . µg/ml and . ± . µg/ml for mers-cov s, oc s and hku s pseudotyped viruses, respectively (figure f-g) whereas no neutralization was observed for sars-cov s and sars-cov- s (figure s ). b therefore broadly neutralizes s-mediated entry of pseudotyped viruses harboring β- coronavirus s glycoproteins from lineages a and c, but not from lineage b, putatively due to lower-affinity binding (figure b-e). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . identification and characterization of a cross-reactive and broadly neutralizing coronavirus mab (a) mouse immunization and b mab selection scheme. mers-cov and sars-cov s subunits fused to human fc and the respiratory syncytial virus fusion glycoprotein (rsv f) ectodomain trimer fused to a foldon and a his-tag were used as decoys during selection. (b-e) binding of mers-cov s (b), oc s (c), sars-cov s (d) and sars- cov- s (e) ectodomain trimers to the b mab immobilized at the surface of biolayer interferometry biosensors. data were analyzed with the fortebio software, and global fits are shown as dashed lines. the vertical dotted lines correspond to the transition between the association and dissociation phases. approximate apparent equilibrium dissociation constants (kd, app) are reported due to the binding avidity resulting from the trimeric nature of s glycoproteins. (f-h) b -mediated neutralization of vsv particles pseudotyped with mers-cov s (f), oc s (g) and hku s (h). data were evaluated using a non- linear sigmoidal regression model with variable hill slope. fit is shown as dashed lines and experiments were performed in triplicate with at least two independent mab and pseudotyped virus preparations. b targets a linear epitope in the fusion machinery to identify the epitope recognized by b , we determined a cryo-em structure of the mers-cov s glycoprotein in complex with the b fab fragment at . Å overall resolution (figure a-b, figure s and table ). d classification of the cryoem data revealed incomplete fab saturation, with one to three b fabs bound to the mers-cov s trimer, and a marked conformational dynamic of bound b fabs, yielding a continuum of conformations. although these two factors compounded local resolution of the s/b interface, we identified that the b epitope resides in the stem helix (i.e. downstream from the connector domain and before the heptad-repeat region) within the s subunit (so- called fusion machinery) (figure a-b). our d reconstructions further suggest that b binding disrupts the stem helix quaternary structure, which is presumed to form a -helix bundle (observed in the nl s(walls et al., b) and sars-cov/sars-cov- s structures(gui et al., ; kirchdoerfer et al., ; walls et al., b; walls et al., ; wrapp et al., ; yuan et al., )) but not maintained in the b -bound mers- cov s structure (figure a). based on our cryoem structure, we identified a conserved residue sequence at the c-terminus of the last residue resolved in previously reported mers-cov s .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / structures(pallesen et al., ; park et al., ; walls et al., ; yuan et al., ) and confirmed by biolayer interferometry that it encompasses the b epitope using synthetic mers-cov s biotinylated peptides (figure c-e and figure s ). we further found that b bound to the corresponding stem helix peptides from all known human- infecting β-coronaviruses: sars-cov- and sars-cov, the sequence is strictly conserved among the two viruses, oc and hku as well as mouse hepatitis virus and two mers-cov-related bat viruses (hku and hku ) in mab and fab formats (figure d-e). b interacted most efficiently with the mers-cov s peptide, likely due to its major role in elicitation of this mab, followed by all other coronavirus peptides tested, which bound with comparable affinities, except for hku which interacted more weakly than other stem helix peptides. figure . b targets a linear epitope in the coronavirus s fusion machinery. (a-b) molecular surface representation of a composite model of the b -bound mers- cov s cryoem structure and of the b -bound mers-cov s stem helix peptide crystal structure shown from the side (a) and viewed from the viral membrane (b). mers-cov s protomers are colored pink, cyan and gold and the b fab heavy and light chains are colored purple and magenta, respectively. the composite model was generated by docking the crystal structure of b bound to the mers-cov stem helix in the cryoem map. (c) identification of a conserved residue sequence spanning the stem helix. residue numbering for mers-cov s and sars-cov- s are indicated on top and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / bottom of the alignment, respectively. (d) binding of . µm b mab or (e) µm b fab to biotinylated coronavirus s stem helix peptides immobilized at the surface of biolayer interferometry biosensors. b recognizes a conserved epitope in the stem helix to obtain an atomic-level understanding of the broad b cross-reactivity, we determined five crystal structures of the b fab in complex with peptide epitopes derived from mers-cov s (residues - or - ), sars-cov s (residues - ), sars-cov- s (residues - ), oc s (residues - ) and hku s (residues - ), at resolutions ranging from . to . Å (figure a-f, figure s and table ). in all five structures, the stem helix epitope folds as an amphipatic ɑ- helix resolved for residues - (mers-cov s numbering) irrespective of the peptide length used for co-crystallization. b interacts with the helical epitope through shape complementarity, hydrogen-bonding and salt bridges using complementarity determining regions cdrh -h , framework region , cdrl and cdrl to bury ~ Å at the paratope/epitope interface. the stem helix docks its hydrophobic face, lined by residues f mers-cov, l mers-cov, f mers-cov and f mers-cov, into a hydrophobic groove formed by b heavy chain residues y , w , v and l as well as light chain y (figure c and a, b and d). moreover, b binding leads to the formation of a salt bridge triad, involving residue d mers-cov, cdrh residue r and cdrl residue h . comparison of the b -bound structures of mers-cov, hku , sars-cov/sars- cov- and oc s stem helix peptides explains the broad mab cross-reactivity with β- coronavirus s glycoproteins as shape complementarity is maintained through strict conservation of out of hydrophobic residues whereas f mers-cov is conservatively substituted with y sars-cov/y sars-cov- or w oc /w hku (our structures demonstrate that all three aromatic side chains are accommodated by b ). furthermore, the d mers-cov-mediated salt bridge triad is preserved, including with a non-optimal e hku side chain, with the exception of s hku which abrogates these interactions and explains the dampened b binding to the hku peptide (figure c-e and b-f). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / b heavy chain residue l and cdrl residue h are mutated from germline and make major contributions to epitope recognition, highlighting the key contribution of affinity maturation to the cross-reactivity of this mab. figure . molecular basis for the broad b cross-reactivity with a conserved coronavirus stem helix peptide. (a) crystal structure of the b fab (surface rendering) in complex with the mers-cov s stem helix peptide. (b-c) crystal structures of the b fab bound to the mers-cov s (b) or hku s (c) stem helix reveal a conserved network of interactions except for the substitution of d mers-cov with e hku which preserves the salt bridge triad formed with cdrh residue r and cdrl residue h . (d-f) crystal structures of the b fab bound to the mers-cov s (d), oc s (e) or sars-cov/sars-cov- s (f) stem helix showcasing the conservation of the paratope/epitope interface except for the conservative substitution of f mers-cov with w oc or y sars-cov/y sars-cov- . the b heavy and light chains are colored purple and magenta, respectively, and only selected regions are shown in panels (b-f) for clarity. the coronavirus s stem helix peptides are rendered in ribbon representation and colored gold with interacting side chains shown in stick representation. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / mechanism of b -mediated neutralization we set out to elucidate the molecular basis of the b -mediated broad neutralization of multiple coronaviruses from lineages a and c and lack of inhibition of lineage b coronaviruses. our biolayer interferometry data indicate that although the b mab efficiently interacted with the stem helix peptide of all but one of coronaviruses evaluated (hku , figure d-e), the sars-cov- s and sars-cov s ectodomain trimers bound to b with three orders of magnitude reduced avidities compared to mers-cov s (figure b-e). whereas the b epitope is not resolved in any prefusion coronavirus s structures determined to date, the stem helix region directly upstream is resolved to a much greater extent for sars-cov- s and sars-cov s, indicating a rigid structure(gui et al., ; kirchdoerfer et al., ; walls et al., b; walls et al., ; yuan et al., ) compared to mers-cov s (pallesen et al., ; park et al., ; walls et al., ; yuan et al., ), oc s (tortorici et al., ), hku s (kirchdoerfer et al., ) or mhv s (walls et al., a) (figure a-c). furthermore, we determined b fab binding affinities of . µm and . µm for mers-cov s and oc s, respectively, whereas sars-cov s recognition was too weak to accurately quantitate (figure s ). these findings along with the largely hydrophobic nature of the b epitope, which is expected to be occluded in the center of a -helix bundle (figure d-e) (as is the case for the region directly n-terminal to it), suggest that b recognizes a cryptic epitope and that binding to s trimers is modulated (at least in part) by the quaternary structure of the stem. the reduced conformational dynamics of the sars-cov- s and sars-cov s stem helix quaternary structure is expected to limit b accessibility to its cryptic epitope relative to other coronavirus s glycoproteins (figure a-e). this hypothesis is supported by the correlation between neutralization potency and binding affinity which likely explains the lack of neutralization of lineage b β-coronaviruses. analysis of the postfusion mouse hepatitis s (walls et al., ), sars-cov- s (cai et al., ) and sars-cov s (fan et al., ) structures show that the b epitope is buried at the interface with the other two protomers of the rod-shaped trimer. as a result, b binding appears to be incompatible with adoption of the postfusion s conformation (figure f). collectively, the data presented here suggest that b binding .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / sterically interferes with s fusogenic conformational changes and likely block viral entry through inhibition of membrane fusion (figure c-f), as proposed for fusion machinery- directed mabs against influenza virus(corti et al., ), ebolavirus(king et al., ) or hiv(kong et al., ). figure . b binding disrupts the stem helix bundle and sterically inhibits membrane fusion. (a) cryoem map of prefusion sars-cov- s (emd- ) filtered at Å resolution to emphasize the intact trimeric stem helix bundle. (b) cryoem map of the mers-cov s–b complex showing a disrupted stem helix bundle. (c) model of b - induced s stem movement obtained through comparison of the apo sars-cov- s and b -bound mers-cov s structures. (d-f) proposed mechanism of inhibition mediated by the b mab. b binds to the hydrophobic core (red) of the stem helix bundle and disrupts .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / its quaternary structure (d-e). the b disrupted state likely prevents s subunit refolding from the pre- to the post-fusion state and blocks viral entry (f). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / discussion the high sequence variability of viral glycoproteins was long considered as an unsurmountable obstacle to the development of mab therapies or vaccines conferring broad protection(corti and lanzavecchia, ). the identification of broadly neutralizing mabs targeting conserved hiv- envelope epitopes from infected individuals brought about a paradigm shift for this virus undergoing extreme antigenic drift(huang et al., ; kong et al., ; scheid et al., ; walker et al., ; walker et al., ; wu et al., ; zhou et al., ). heterotypic influenza virus neutralization was also described for human cross-reactive mabs recognizing the hemagglutinin receptor-binding site or the fusion machinery(corti et al., ; dreyfus et al., ; ekiert et al., ; ekiert et al., ; kallewaard et al., ; whittle et al., ). these findings were paralleled by efforts to identify broadly neutralizing abs against respiroviruses(corti et al., ), henipaviruses(dang et al., ; mire et al., ; zhu et al., ), dengue and zika viruses(barba-spaeth et al., ; dejnirattisai et al., ; rouvinski et al., ) or ebolaviruses(bornholdt et al., ; flyak et al., ; king et al., ; west et al., ). the genetic diversity of coronaviruses circulating in chiropteran and avian reservoirs along with the recent emergence of multiple highly pathogenic coronaviruses showcase the need for vaccines and therapeutics that protect humans against a broad range of viruses. as the s fusion machinery contains several important antigenic sites and is more conserved than the s subunit, it is an attractive target for broad-coronavirus neutralization(tortorici and veesler, ; walls et al., a). previous studies described conserved epitopes targeted by neutralizing abs, such as the fusion peptide or heptad-repeats, as well as a variable loop in the mers-cov s connector domain (daniel et al., ; elshabrawy et al., ; pallesen et al., ; poh et al., ; walls et al., a; wec et al., ; zhang et al., ; zheng et al., ). the discovery of the b mab provides proof-of-concept of mab-mediated broad β-coronavirus neutralization and uncovers a previously unknown conserved cryptic epitope that is predicted to be located in the hydrophobic core of the stem helix. b cross-reacts with at least eight distinct s glycoproteins, from β-coronaviruses belonging to lineages a, b and c, and broadly .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / neutralize two human and one bat pseudotyped viruses from lineages a and c. b could be used for detection or diagnostic of coronavirus infection and humanized versions of this mab are promising candidate therapeutics against emerging and re-emerging β- coronaviruses from lineages a and c. our data further suggest that affinity maturation of b using sars-cov- s and sars-cov s might enhance recognition of and extend neutralization breadth towards β-coronaviruses from lineage b. finally, the identification of the conserved b epitope paves the way for epitope-focused vaccine design(azoitei et al., ; correia et al., ; sesterhenn et al., ) that could elicit pan-coronavirus immunity, as supported by the elicitation of the b mab through vaccination and the recent findings that humans and camels infected with mers-cov, humans infected with sars-cov- and humanized mice immunized with a cocktail of coronavirus s glycoproteins produce antibodies targeting an epitope similar to the one targeted by b (song et al., ; wang et al., b). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / acknowledgments we thank hideki tani (university of toyama) for providing the reagents necessary for preparing vsv pseudotyped viruses and brooke fiala for assisting with protein production. this study was supported by the national institute of general medical sciences (r gm to d.v.), the national institute of allergy and infectious diseases (dp ai and hhsn c to d.v.), a pew biomedical scholars award (d.v.), an investigators in the pathogenesis of infectious disease awards from the burroughs wellcome fund (d.v.), a fast grants (d.v.), the university of washington arnold and mabel beckman cryoem center, the swiss national science foundation (p pb_ to m.m.s.), the pasteur institute (m.a.t.) the m.j. murdock charitable trust (a.t.m and b.h.), and beamlines . . and . . at the advanced light source at lawrence berkley national laboratory. declaration of interests m.m.s, m.a.t., y.j.p., a.c.w, a.t.m. and d.v. are named as inventors on patent applications filed by the university of washington based on the studies presented in this paper. d.v. is a consultant for vir biotechnology inc. the veesler laboratory has received an unrelated sponsored research agreement from vir biotechnology inc. the other authors declare no competing interests. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / supplementary information table . cryoem data collection and refinement statistics. b /mers-cov-s (c map, post polishing) b /mers-cov-s (c map, before polishing) data collection magnification , , voltage (kv) total exposure (e-/Å ) defocus range (µm) - . to - . - . to - . pixel size (Å) . . initial particle stack , , final particle stack , , map resolution ( . fsc threshold) (Å) . . map b-factor - . - . symmetry c c model refinement model resolution ( . fsc threshold) (Å) . model composition nonhydrogen atoms , protein residues ligand mean b-factors (Å ) protein . ligand . r.m.s. deviations bond lengths (Å) . bond angles (°) . validation molprobity score . clash score . rotamer outliers (%) . ramachandran favored (%) . allowed (%) . disallowed (%) . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table . x-ray crystallography data collection and refinement statistics. complex b /mers-cov aa b /mers-cov aa b /hku b /oc b /sars- cov/sars-cov- data collection space group c c c c c cell constants a,b,c (Å) . , . , . . , . , . . , . , . . , . , . . , . , . a,b,g (˚) , . , , . , , . , , . , , . , wavelength (Å) . . . . . resolution (Å) . - . ( . - . ) . - . ( . - . ) . - . ( . - . ) . - . ( . - . ) . - . ( . - . ) rmerge (%) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) i/s(i) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) cc( / ) . ( . ) ( . ) . ( . ) . ( . ) ( . ) completeness (%) . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) redundancy . ( . ) . ( . ) . ( . ) . ( . ) . ( . ) refinement resolution (Å) . - . . - . . - . . - . . - . unique reflections , , , , , rwork/rfree (%) . / . . / . . / . . / . . / . number of protein atoms number of waters r.m.s.d. bond lengths (Å) . . . . . r.m.s.d. bond angles (˚) . . . . . ramachandran favored (%) . . . . . ramachandran allowed (%) . . . . . ramachandran outliers (%) anumbers in parentheses refer to outer resolution shell .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure s . mers-cov s, sars-cov s and sars-cov- s pseudotyped virus neutralization. neutralization assays of mlv (a-c) or vsv (d-f) particles pseudotyped with (a,d) mers-cov s (b,e) sars-cov s and (c,f) sars-cov- s were performed in the presence of the indicated concentration of b mab. data were evaluated using a non-linear sigmoidal regression model with variable hill slope. experiments were performed in triplicates with at least two independent mab and pseudotyped virus preparations. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure s . cryoem characterization of the b -bound mers-cov s complex. (a) representative cryoem micrograph of the mers-cov s prefusion trimer bound to b embedded in vitreous ice. scale bar: nm. (b) selected reference-free d class averages. scale bar: nm. (c) fourier shell correlation curves for the reconstructions shown in panels d and e. (d) reconstruction obtained with all selected particles and applying c symmetry colored by local resolution. (e) reconstruction obtained with a subset of particles obtained through focused classification to improve b resolvability colored by local resolution. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure s . protein sequence alignment of the stem region for selected β- coronavirus s glycoproteins. the sequence alignment was performed based on mers-cov s using the following s protein sequences: mers-cov emc/ (genbank: afs . ), hku (uniprotkb: a ex . ), hku (uniprotkb: a exd . ), hku isolate n (uniprotkb: q zme . ), mhv a (uniprotkb: p . ), oc (uniprotkb: q p ), sars-cov urbani (genbank: aap . ), sars-cov- (ncbi reference sequence: yp_ . ). sequence alignment was performed using multalin(corpet, ) and visualized using esprint . (robert and gouet, ). the conserved stem helix recognized by b is indicated. figure s . crystal structures of b bound to coronavirus s stem helix peptides. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / stem peptides of (a) mers-cov s (b) oc s (c) sarscov/sars-cov- s and (d) hku s are shown in stick representation with carbon atoms colored yellow. b is shown in ribbon representation with interacting residues rendered as stick representation in gray. oxygen and nitrogen atoms are colored red and blue, respectively. the fo-fc maps for the different peptides are shown as a blue mesh at a contour level of σ. figure s . b binding kinetics to different coronavirus s ectodomain trimers. a-c) binding of b to immobilized (a) mers-cov s, (b) oc s and (c) sars-cov s measured by biolayer interferometry. the vertical dotted lines correspond to the transition between the association and dissociation phases. data are shown for one representative measurement and were analyzed with the octetbio software. global fits are shown as dashed lines. we determined dissociation constant (kd) values of . ( . ) ± . and . ( . ) ± . µm for two independent batches of s protein for mers-cov s and oc s, respectively. the dissociation constant for sars-cov s could not be evaluated reliably, however, the predicted affinity is significantly lower compared to the other two s proteins. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / methods identification of the b broadly neutralizing mab ten-week-old cd- mice were injected twice with µg of mers-cov s formulated with adjuplex at weeks and and once with µg of sars-cov s formulated with adjuplex at week at the fred hutchinson cancer research center antibody technology resource. days after the final injection splenocytes were isolated from high titer mice and electrofused with p x -ag myeloma cell line (btx, harvard apparatus). hybridoma supernatants were tested for binding to prefusion sars-cov s, mers-cov s, sars-cov s subunit, mers-cov s subunit and respiratory syncytial virus f (which harbors a foldon motif and a his tag similar to the sars-cov s and mers-cov s ectodomain trimer constructs) using a high throughput bead-based binding array. hybridomas from wells containing supernatants that were positive for binding to prefusion sars-cov s and mers-cov s but negative for sars-cov s , mers-cov s , and respiratory syncytial virus f were sub-cloned by limiting dilution and re-screened for binding as above. the vh and vl sequences of b were recovered using the mouse ig primer set (millipore) using the protocol outlined in (siegel, ), and sanger sequenced (genewiz). the vh/vl sequences were codon-optimized and cloned into full-length ptt derived igg and igl kappa expression vectors containing human constant regions using gibson assembly (snijder et al., ). protein expression and purification mers-cov p s, oc s, sars-cov p s and sars-cov- p s were produced as previously described (tortorici et al., ; walls et al., b; walls et al., ). briefly, all ectodomains were produced in hek f cells grown in suspension using freestyle expression medium (life technologies) at °c in a humidified % (v/v) co incubator rotating at r.p.m. the cultures were transfected using fectin (thermofisher scientific) with cells grown to a density of cells/ml and cultivated for three days. the supernatants were harvested and cells resuspended for another three days, yielding two harvests. for mers-cov p s, sars-cov p s and sars-cov- p s, clarified supernatants were purified using a ml cobalt affinity column (takara). hcov- oc s was purified using a streptrap hp column (ge healthcare). purified proteins .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / were concentrated, flash-frozen in tris-saline ( mm tris, ph . ( °c), mm nacl) and stored at - °c. the mers-cov s -fc and sars-cov s -fc were previously described (raj et al., ), produced as aforementioned for the prefusion s trimers and purified using protein a affinity chromatography. for mab b production, µg of b heavy and µg of b light chain encoding plasmids were co-transfected per liter of suspended hek f culture using free transfection reagent (millipore sigma) according to the manufacturer’s instructions. cells were transfected at a density of cells/ml. expression was carried out for days after which cells and cellular debris were removed by centrifugation at , × g followed by filtration through a . µm filter. clarified cell supernatant containing recombinant mab was passed over protein a agarose resin (thermo fisher scientific). protein a resin was extensively washed with mm phosphate ph . , mm nacl (pbs) and eluted with igg elution buffer (thermo scientific). purified b was extensively dialyzed against pbs, concentrated, flash-frozen and stored at - °c. ds-cav -foldon-spytag (mclellan et al., ) was produced by lentiviral transduction of hek f cells using the daedalus system (bandaranayake et al., ). lentivirus was produced by transient transfection of hek t cells (atcc) using linear kda polyethyleneimine (pei; polysciences). briefly, × ^ cells were plated onto cm tissue culture plates. after h, mg of pspax , . mg of pmd g (addgene plasmids # and # , respectively), and mg of lentiviral vector plasmid were mixed in ml diluent ( mm hepes, mm nacl, ph . ) and ml of pei ( mg/ml) and incubated for minutes. the dna/pei complex was then added to the plate dropwise. lentivirus was harvested h post-transfection and concentrated × by centrifugation at ×g for h. transduction of the target cell line was carried out in ml shake flasks containing × ^ cells in ml of growth media. μl of × lentivirus was added to the flask and the cells were incubated with rpm oscillation at °c in % co for – hours, after which ml of growth media was added to the shake flask. transduced cells were expanded every other day to a density of × ^ cells/ml until a final culture size of l was reached. the media was harvested after days of total incubation after measuring final cell concentration (~ × ^ cells/ml) and viability (~ % .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / viable). culture supernatant was harvested by low-speed centrifugation to remove cells from the supernatant. nacl and nan were added to final concentrations of mm and . %, respectively. the supernatant was loaded over one ml histrap ff crude column (ge healthcare) at ml/min by an akta pure (ge healthcare). the ml histrap column was washed with column volumes of wash buffer ( × gibco - pbs, mm imidazole, ph . ) followed by column volumes of elution buffer ( × gibco - pbs, mm imidazole, ph . ). the nickel elution was applied to a hiload / superdex pg column (ge healthcare) and run in dpbs (gibco - ) with % glycerol (thermo bp - ) to further purify the target protein by size- exclusion chromatography. the purified protein was snap frozen in liquid nitrogen and stored at - °c. kinetics of b mab binding to coronavirus s proteins the avidities of complex formation between b mab and selected coronavirus s proteins were determined in pbs supplemented with . % tween and . % bsa (pbstb) at °c and , rpm shaking on an octet red instrument (fortebio). curve fitting was performed using a : binding model and the fortebio data analysis software. kd ranges were determined with a global fit. ahc biosensors (fortebio) were hydrated in water and subsequently equilibrated in pbstb buffer. μg/ml b mab was loaded to the biosensors to a shift of approximately nm. then, the system was equilibrated in pbstb buffer for s prior to immersing the sensors in the respective coronavirus s protein ( - nm) for up to s prior to dissociation in buffer for additional s. binding of b to different synthetic coronavirus s stem peptides b binding analysis to selected biotinylated coronavirus s stem helix peptides was performed in pbs supplemented with . % tween (pbst) at °c and , rpm shaking on an octet red instrument (fortebio). µg/ml biotinylated stem peptide ( - or -residue long stem peptide-peg -lys-biotin synthesized fom genscript) was loaded on sa biosensors to a threshold of . nm. then, the system was equilibrated in pbst for s prior to immersing the sensors in . µm b mab or µm b fab, respectively, for s prior to dissociation in buffer for s. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / kinetics of b fab binding to different coronavirus s proteins the rate constants of binding (kon) and dissociation (koff) for the complex between the b fab and selected coronavirus s proteins were performed in pbst at °c and , rpm shaking on an octet red instrument (fortebio). global curve fitting was performed using a : binding model and the fortebio data analysis software. for mers-cov s and sars-cov s, his k or ni-nta biosensors (fortebio) were hydrated in water and subsequently equilibrated in pbst buffer. μg/ml sars-cov s or μg/ml mers- cov s, respectively, were loaded to the biosensors for up to s ( - nm shift). the system was equilibrated in pbst for s prior to immersing the sensors in b fab ( - µm) for up to s prior to dissociation in buffer for s. for oc s, arg biosensors were hydrated in water then activated for s with an nhs-edc solution (fortebio) prior to amine coupling. μg/ml oc was amine coupled to ar g (fortebio) sensors in mm acetate ph . (fortebio) respectively for s and then quenched with m ethanolamine (fortebio) for s. the system was equilibrated in pbst for s prior to immersing the sensors in b fab ( - µm) for s prior to dissociation in buffer for s. pseudovirus entry assays production of oc s pseudotyped vsv virus and the neutralization assay was performed as described previously (hulswit et al., ; tortorici et al., ). briefly, hek- t cells at ~ % confluency were transfected with the pcaggs expression vectors encoding full-length oc s with a truncation of the c-terminal residues (to increase cell surface expression levels) along with fusion to a flag tag and the fc-tagged bovine coronavirus hemagglutinin esterase protein at molar ratios of : . h after transfection, cells were transduced with vsv∆g/fluc (bearing the photinus pyralis firefly luciferase) (kaname et al., ) at a multiplicity of infection of . twenty-four hours later, supernatant was harvested and filtered through . μm membrane. pseudotyped vsv virus was titrated on monolayer on hrt- cells. in the virus neutralization assay, serially diluted mabs were pre-incubated with an equal volume of virus at room temperature for .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / h, and then inoculated on hrt- cells, and further incubated at ˚c. after h, cells were washed once with pbs, lysed with cell lysis buffer (promega) and firefly luciferase expression was measured on a berthold centro lb plate luminometer using d- luciferin as a substrate (promega). percentage of infectivity was calculated as the ratio of luciferase readout in the presence of mabs normalized to luciferase readout in the absence of mab, and half maximal inhibitory concentrations (ic ) were determined using -parameter logistic regression (graphpad prism v . ). mers-cov s, sars-cov s and sars-cov- s pseudotyped vsv were prepared using t cells seeded in -cm dishes in dmem supplemented with % fbs, % penstrep and transfected with plasmids encoding for the corresponding s glycoprotein ( µg/dish) using lipofectamine (life technologies) according to the manufacturer’s instructions. one day post-transfection, cells were infected with vsv(g*Δg-luciferase). after h, infected cells were washed four times with dmem before medium supplemented with anti-vsv-g antibody (i - mouse hybridoma supernatant diluted to , from crl- , atcc). particles were harvested h post-inoculation, clarified from cellular debris by centrifugation at , x g for min and used for neutralization experiments. mers-cov s, sars-cov s, and sars-cov- s pseudotypes mlv were prepared as previously described (walls et al., b). for viral neutralization, huh cells (for mers-cov s pseudotyped virus) or stable t cells expressing ace (crawford et al., ) (for sars-cov s and sars-cov- s pseudotyped viruses) in dmem supplemented with % fbs, % penstrep were seeded at , cells/well into clear bottom white walled -well plates and cultured overnight at °c. twelve-point -fold serial dilutions of b mab were prepared in dmem and pseudotyped vsv were added : to each b dilution in the presence of anti-vsv-g mab from i - mouse hybridoma supernatant diluted times. after min incubation at ̊c, µl of the mixture was added to the cells and h post-infection, μl dmem was added to the cells for - h. following infection, medium was removed and μl one- glo-ex substrate (promega) was added to the cells and incubated in the dark for min prior reading on a varioskan lux plate reader (thermofisher). western blots .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / sds–page ( x) loading buffer was added to all concentrated pseudovirus samples. the samples were run on a – % (wt/vol) gradient tris-glycine gel (biorad) and transferred to pvdf membranes. b was used as primary ab ( : dilution) and an alexa fluor -conjugated goat anti-human secondary ab ( : , dilution, jackson laboratory) were used for western blotting. a li-cor processor was used to develop images. cryoem sample preparation and data collection. lacey carbon copper grids ( mesh) were coated with a thin-layer of continuous carbon using a carbon evaporator. mg/ml mers-cov s was incubated with mm neuraminic acid (to promote the closed trimer conformation), mm tris ph ( °c) mm nacl for h at °c. then a -fold molar excess of b fab over mers-cov s protomer was added to the solution and incubated for h at °c. the sample was diluted to . mg/ml s protein with mm neuraminic acid- mm tris ph ( °c) mm nacl before µl sample were applied on to a freshly glow discharged grid. plunge freezing was performed using a tfs vitrobot mark iv (blot force: - , blot time: . s, humidity: %, temperature: °c). data were acquired using an fei titan krios transmission electron microscope operated at kv and equipped with a gatan k summit direct detector and gatan quantum gif energy filter, operated in zero-loss mode with a slit width of ev. automated data collection was carried out using leginon (suloway et al., ) at a nominal magnification of , x with a pixel size of . Å. the dose rate was adjusted to counts/pixel/s, and each movie was acquired in super- resolution mode fractionated in frames of ms. , micrographs were collected in a single session with a defocus range comprised between - . and - . μm. cryoem data processing movie frame alignment, estimation of the microscope contrast-transfer function parameters, particle picking and extraction were carried out using warp (tegunov and cramer, ). particle images were extracted with a box size of pixels binned to pixels yielding a pixel size of . Å. two rounds of reference-free d classification were performed using relion . (zivanov et al., ) to select well-defined particle images. subsequently, two rounds of d classification with iterations each (angular .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / sampling . ° for iterations and . ° with local search for iterations), using the previously reported closed mers-cov s structure without the g fab (pdb w j) as initial model were carried out using relion without imposing symmetry. for the high resolution map, particle images were subjected to bayesian polishing (zivanov et al., ) before performing non-uniform refinement, defocus refinement and non-uniform refinement again in cryosparc (punjani et al., ). finally, two rounds of global ctf refinement of beam-tilt, trefoil and tetrafoil parameters was performed before a final round of non-uniform refinement to produce the . Å resolution map. for the lower resolution map, one additional round of focused classification in relion with iterations using a broad mask covering the region of interest (b /stem) was carried out to further separate distinct b fab conformations. d refinements of the best subclasses were carried out using homogenous refinement in cryosparc (punjani et al., ). reported resolutions are based on the gold-standard fourier shell correlation (fsc) of . criterion and fourier shell correlation curves were corrected for the effects of soft masking by high-resolution noise substitution (chen et al., ). cryoem model building and analysis ucsf chimera (pettersen et al., ) and coot (emsley et al., ) were used to fit atomic models into the cryoem maps. the mers-cov s em structure in complex with - n-acetyl neuraminic acid (pdb q , residue - ) and the b -mers-cov (residue - ) crystal structure were fit into the cryoem map. subsequently the linker connecting the stem helix to the rest of the mers-cov s ectodomain (residue - ) was manually built using coot. n-linked glycans were hand-built into the density where visible and the models were refined and relaxed using rosetta using both sharpened and unsharpened maps (frenz et al., ; wang et al., ). models were analyzed using molprobity (chen et al., ), emringer (barad et al., ), phenix (liebschner et al., ) and privateer (agirre et al., ) to validate the stereochemistry of both the protein and glycan components. figures were generated using ucsf chimera. crystallization and structure determination .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / all crystallization experiments were performed at °c in hanging drop vapor diffusion experiments with initial concentrations of mg/ml and . -fold molar excess of peptide ligand. crystal trays were setup with a mosquito using nl mother liquor solution and or nl b /peptide complex solution, respectively. crystals of b /mers-cov and b /oc appeared after several weeks in . m potassium thiocyanate and % (w/v) peg , b /mers-cov in . m magnesium chloride and % (w/v) peg , b /hku in . m sodium chloride, . m mes-naoh, ph . and % (w/v) peg , b -sars-cov/sars-cov- in . m potassium chloride and % (w/v) peg . crystals were cryoprotected by addition of glycerol to a final concentration of % (v/v) and flash cooled in liquid nitrogen. diffraction data were collected at the beamlines . . and . . (advanced light source, berkeley, usa). all data were integrated, indexed and scaled using mosflm (battye et al., ) and aimless (evans and murshudov, ) or xds (kabsch, ). the structures were solved by molecular replacement using phaser (mccoy et al., ) and the s fab (pdb nb ) or b fab without ligand as a search model. model building was performed with coot (emsley et al., ) and structure refinement with buster (blanc et al., ) and phenix (liebschner et al., ). validation used molprobity (chen et al., ) and phenix (liebschner et al., ). data availability the atomic coordinates and cryoem maps will be deposited to the protein data bank and electron microscopy data bank. references agirre, j., iglesias-fernández, j., rovira, c., davies, g.j., wilson, k.s., and cowtan, k.d. ( ). privateer: software for the conformational validation of carbohydrate structures. nat struct mol biol , - . alsoussi, w.b., turner, j.s., case, j.b., zhao, h., schmitz, a.j., zhou, j.q., chen, r.e., lei, t., rizk, a.a., mcintire, k.m., et al. ( ). a potently neutralizing antibody protects mice against sars-cov- infection. j immunol. anthony, s.j., gilardi, k., menachery, v.d., goldstein, t., ssebide, b., mbabazi, r., navarrete- macias, i., liang, e., wells, h., hicks, a., et al. ( ). further evidence for bats as the evolutionary source of middle east respiratory syndrome coronavirus. mbio . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / azoitei, m.l., correia, b.e., ban, y.e., carrico, c., kalyuzhniy, o., chen, l., schroeter, a., huang, p.s., mclellan, j.s., kwong, p.d., et al. ( ). computation-guided backbone grafting of a discontinuous motif onto a protein scaffold. science , - . bandaranayake, a.d., correnti, c., ryu, b.y., brault, m., strong, r.k., and rawlings, d.j. ( ). daedalus: a robust, turnkey platform for rapid production of decigram quantities of active recombinant proteins in human cell lines using novel lentiviral vectors. nucleic acids res , e . barad, b.a., echols, n., wang, r.y., cheng, y., dimaio, f., adams, p.d., and fraser, j.s. ( ). emringer: side chain-directed model and map validation for d cryo-electron microscopy. nat methods , - . barba-spaeth, g., dejnirattisai, w., rouvinski, a., vaney, m.c., medits, i., sharma, a., simon- lorière, e., sakuntabhai, a., cao-lormeau, v.m., haouz, a., et al. ( ). structural basis of potent zika-dengue virus antibody cross-neutralization. nature , - . barnes, c.o., jette, c.a., abernathy, m.e., dam, k.-m.a., esswein, s.r., gristick, h.b., malyutin, a.g., sharaf, n.g., huey-tubman, k.e., lee, y.e., et al. ( a). structural classification of neutralizing antibodies against the sars-cov- spike receptor-binding domain suggests vaccine and therapeutic strategies. biorxiv, . . . . barnes, c.o., west, a.p., huey-tubman, k.e., hoffmann, m.a.g., sharaf, n.g., hoffman, p.r., koranda, n., gristick, h.b., gaebler, c., muecksch, f., et al. ( b). structures of human antibodies bound to sars-cov- spike reveal common epitopes and recurrent features of antibodies. cell. battye, t.g., kontogiannis, l., johnson, o., powell, h.r., and leslie, a.g. ( ). imosflm: a new graphical interface for diffraction-image processing with mosflm. acta crystallogr d biol crystallogr , - . blanc, e., roversi, p., vonrhein, c., flensburg, c., lea, s.m., and bricogne, g. ( ). refinement of severely incomplete structures with maximum likelihood in buster-tnt. acta crystallogr d biol crystallogr , - . bornholdt, z.a., turner, h.l., murin, c.d., li, w., sok, d., souders, c.a., piper, a.e., goff, a., shamblin, j.d., wollen, s.e., et al. ( ). isolation of potent neutralizing antibodies from a survivor of the ebola virus outbreak. science , - . brouwer, p.j.m., caniels, t.g., van der straten, k., snitselaar, j.l., aldon, y., bangaru, s., torres, j.l., okba, n.m.a., claireaux, m., kerster, g., et al. ( ). potent neutralizing antibodies from covid- patients define multiple targets of vulnerability. science. cai, y., zhang, j., xiao, t., peng, h., sterling, s.m., walsh, r.m., rawson, s., rits-volloch, s., and chen, b. ( ). distinct conformational states of sars-cov- spike protein. science , - . case, j.b., rothlauf, p.w., chen, r.e., kafai, n.m., fox, j.m., smith, b.k., shrihari, s., mccune, b.t., harvey, i.b., keeler, s.p., et al. ( ). replication-competent vesicular stomatitis virus vaccine vector protects against sars-cov- -mediated pathogenesis in mice. cell host microbe , - .e . chen, s., mcmullan, g., faruqi, a.r., murshudov, g.n., short, j.m., scheres, s.h., and henderson, r. ( ). high-resolution noise substitution to measure overfitting and validate resolution in d structure determination by single particle electron cryomicroscopy. ultramicroscopy , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / chen, v.b., arendall, w.b., headd, j.j., keedy, d.a., immormino, r.m., kapral, g.j., murray, l.w., richardson, j.s., and richardson, d.c. ( ). molprobity: all-atom structure validation for macromolecular crystallography. acta crystallogr d biol crystallogr , - . corbett, k.s., edwards, d.k., leist, s.r., abiona, o.m., boyoglu-barnum, s., gillespie, r.a., himansu, s., schäfer, a., ziwawo, c.t., dipiazza, a.t., et al. ( ). sars-cov- mrna vaccine design enabled by prototype pathogen preparedness. nature , - . corpet, f. ( ). multiple sequence alignment with hierarchical clustering. nucleic acids res , - . correia, b.e., bates, j.t., loomis, r.j., baneyx, g., carrico, c., jardine, j.g., rupert, p., correnti, c., kalyuzhniy, o., vittal, v., et al. ( ). proof of principle for epitope-focused vaccine design. nature , - . corti, d., bianchi, s., vanzetta, f., minola, a., perez, l., agatic, g., guarino, b., silacci, c., marcandalli, j., marsland, b.j., et al. ( ). cross-neutralization of four paramyxoviruses by a human monoclonal antibody. nature , - . corti, d., and lanzavecchia, a. ( ). broadly neutralizing antiviral antibodies. annu rev immunol , - . corti, d., voss, j., gamblin, s.j., codoni, g., macagno, a., jarrossay, d., vachieri, s.g., pinna, d., minola, a., vanzetta, f., et al. ( ). a neutralizing antibody selected from plasma cells that binds to group and group influenza a hemagglutinins. science , - . corti, d., zhao, j., pedotti, m., simonelli, l., agnihothram, s., fett, c., fernandez-rodriguez, b., foglierini, m., agatic, g., vanzetta, f., et al. ( ). prophylactic and postexposure efficacy of a potent human monoclonal antibody against mers coronavirus. proc natl acad sci u s a , - . crawford, k.h.d., eguia, r., dingens, a.s., loes, a.n., malone, k.d., wolf, c.r., chu, h.y., tortorici, m.a., veesler, d., murphy, m., et al. ( ). protocol and reagents for pseudotyping lentiviral particles with sars-cov- spike protein for neutralization assays. viruses . dang, h.v., chan, y.p., park, y.j., snijder, j., da silva, s.c., vu, b., yan, l., feng, y.r., rockx, b., geisbert, t.w., et al. ( ). an antibody against the f glycoprotein inhibits nipah and hendra virus infections. nat struct mol biol. daniel, c., anderson, r., buchmeier, m.j., fleming, j.o., spaan, w.j., wege, h., and talbot, p.j. ( ). identification of an immunodominant linear neutralization domain on the s portion of the murine coronavirus spike glycoprotein and evidence that it forms part of complex tridimensional structure. j virol , - . dejnirattisai, w., wongwiwat, w., supasa, s., zhang, x., dai, x., rouvinski, a., jumnainsong, a., edwards, c., quyen, n.t.h., duangchinda, t., et al. ( ). a new class of highly potent, broadly neutralizing antibodies isolated from viremic patients infected with dengue virus. nat immunol , - . dreyfus, c., laursen, n.s., kwaks, t., zuijdgeest, d., khayat, r., ekiert, d.c., lee, j.h., metlagel, z., bujny, m.v., jongeneelen, m., et al. ( ). highly conserved protective epitopes on influenza b viruses. science , - . drosten, c., gunther, s., preiser, w., van der werf, s., brodt, h.r., becker, s., rabenau, h., panning, m., kolesnikova, l., fouchier, r.a., et al. ( ). identification of a novel coronavirus in patients with severe acute respiratory syndrome. n engl j med , - . edridge, a.w.d., kaczorowska, j., hoste, a.c.r., bakker, m., klein, m., loens, k., jebbink, m.f., matser, a., kinsella, c.m., rueda, p., et al. ( ). seasonal coronavirus protective immunity is short-lasting. nat med. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ekiert, d.c., friesen, r.h., bhabha, g., kwaks, t., jongeneelen, m., yu, w., ophorst, c., cox, f., korse, h.j., brandenburg, b., et al. ( ). a highly conserved neutralizing epitope on group influenza a viruses. science , - . ekiert, d.c., kashyap, a.k., steel, j., rubrum, a., bhabha, g., khayat, r., lee, j.h., dillon, m.a., o'neil, r.e., faynboym, a.m., et al. ( ). cross-neutralization of influenza a viruses mediated by a single antibody loop. nature , - . elshabrawy, h.a., coughlin, m.m., baker, s.c., and prabhakar, b.s. ( ). human monoclonal antibodies against highly conserved hr and hr domains of the sars-cov spike protein are more broadly neutralizing. plos one , e . emsley, p., lohkamp, b., scott, w.g., and cowtan, k. ( ). features and development of coot. acta crystallographica section d , - . evans, p.r., and murshudov, g.n. ( ). how good are my data and what is the resolution? acta crystallogr d biol crystallogr , - . fan, x., cao, d., kong, l., and zhang, x. ( ). cryo-em analysis of the post-fusion structure of the sars-cov spike glycoprotein. nat commun , . flyak, a.i., kuzmina, n., murin, c.d., bryan, c., davidson, e., gilchuk, p., gulka, c.p., ilinykh, p.a., shen, x., huang, k., et al. ( ). broadly neutralizing antibodies from human survivors target a conserved site in the ebola virus glycoprotein hr -mper region. nat microbiol , - . folegatti, p.m., ewer, k.j., aley, p.k., angus, b., becker, s., belij-rammerstorfer, s., bellamy, d., bibi, s., bittaye, m., clutterbuck, e.a., et al. ( ). safety and immunogenicity of the chadox ncov- vaccine against sars-cov- : a preliminary report of a phase / , single- blind, randomised controlled trial. lancet. frenz, b., rämisch, s., borst, a.j., walls, a.c., adolf-bryfogle, j., schief, w.r., veesler, d., and dimaio, f. ( ). automatically fixing errors in glycoprotein structures with rosetta. structure , - .e . ge, x.y., li, j.l., yang, x.l., chmura, a.a., zhu, g., epstein, j.h., mazet, j.k., hu, b., zhang, w., peng, c., et al. ( ). isolation and characterization of a bat sars-like coronavirus that uses the ace receptor. nature , - . guan, y., zheng, b.j., he, y.q., liu, x.l., zhuang, z.x., cheung, c.l., luo, s.w., li, p.h., zhang, l.j., guan, y.j., et al. ( ). isolation and characterization of viruses related to the sars coronavirus from animals in southern china. science , - . gui, m., song, w., zhou, h., xu, j., chen, s., xiang, y., and wang, x. ( ). cryo-electron microscopy structures of the sars-cov spike glycoprotein reveal a prerequisite conformational state for receptor binding. cell res , - . haagmans, b.l., al dhahiry, s.h., reusken, c.b., raj, v.s., galiano, m., myers, r., godeke, g.j., jonges, m., farag, e., diab, a., et al. ( ). middle east respiratory syndrome coronavirus in dromedary camels: an outbreak investigation. lancet infect dis , - . hansen, j., baum, a., pascal, k.e., russo, v., giordano, s., wloga, e., fulton, b.o., yan, y., koon, k., patel, k., et al. ( ). studies in humanized mice and convalescent humans yield a sars-cov- antibody cocktail. science. hassan, a.o., case, j.b., winkler, e.s., thackray, l.b., kafai, n.m., bailey, a.l., mccune, b.t., fox, j.m., chen, r.e., alsoussi, w.b., et al. ( a). a sars-cov- infection model in mice demonstrates protection by neutralizing antibodies. cell , - .e . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hassan, a.o., kafai, n.m., dmitriev, i.p., fox, j.m., smith, b.k., harvey, i.b., chen, r.e., winkler, e.s., wessel, a.w., case, j.b., et al. ( b). a single-dose intranasal chad vaccine protects upper and lower respiratory tracts against sars-cov- . cell , - .e . hu, b., zeng, l.p., yang, x.l., ge, x.y., zhang, w., li, b., xie, j.z., shen, x.r., zhang, y.z., wang, n., et al. ( ). discovery of a rich gene pool of bat sars-related coronaviruses provides new insights into the origin of sars coronavirus. plos pathog , e . huang, j., ofek, g., laub, l., louder, m.k., doria-rose, n.a., longo, n.s., imamichi, h., bailer, r.t., chakrabarti, b., sharma, s.k., et al. ( ). broad and potent neutralization of hiv- by a gp -specific human antibody. nature , - . hulswit, r.j.g., lang, y., bakkers, m.j.g., li, w., li, z., schouten, a., ophorst, b., van kuppeveld, f.j.m., boons, g.j., bosch, b.j., et al. ( ). human coronaviruses oc and hku bind to -o-acetylated sialic acids via a conserved receptor-binding site in spike protein domain a. proc natl acad sci u s a. huo, j., zhao, y., ren, j., zhou, d., duyvesteyn, h.m.e., ginn, h.m., carrique, l., malinauskas, t., ruza, r.r., shah, p.n.m., et al. ( ). neutralisation of sars-cov- by destruction of the prefusion spike. cell host & microbe. jackson, l.a., anderson, e.j., rouphael, n.g., roberts, p.c., makhene, m., coler, r.n., mccullough, m.p., chappell, j.d., denison, m.r., stevens, l.j., et al. ( ). an mrna vaccine against sars-cov- - preliminary report. n engl j med. kabsch, w. ( ). xds. acta crystallogr d biol crystallogr , - . kallewaard, n.l., corti, d., collins, p.j., neu, u., mcauliffe, j.m., benjamin, e., wachter- rosati, l., palmer-hill, f.j., yuan, a.q., walker, p.a., et al. ( ). structure and function analysis of an antibody recognizing all influenza a subtypes. cell , - . kan, b., wang, m., jing, h., xu, h., jiang, x., yan, m., liang, w., zheng, h., wan, k., liu, q., et al. ( ). molecular evolution analysis and geographic investigation of severe acute respiratory syndrome coronavirus-like virus in palm civets at an animal market and on farms. j virol , - . kaname, y., tani, h., kataoka, c., shiokawa, m., taguwa, s., abe, t., moriishi, k., kinoshita, t., and matsuura, y. ( ). acquisition of complement resistance through incorporation of cd /decay-accelerating factor into viral particles bearing baculovirus gp . j virol , - . ke, z., oton, j., qu, k., cortese, m., zila, v., mckeane, l., nakane, t., zivanov, j., neufeldt, c.j., cerikan, b., et al. ( ). structures and distributions of sars-cov- spike proteins on intact virions. nature. king, l.b., west, b.r., moyer, c.l., gilchuk, p., flyak, a., ilinykh, p.a., bombardi, r., hui, s., huang, k., bukreyev, a., et al. ( ). cross-reactive neutralizing human survivor monoclonal antibody bdbv targets the ebolavirus stalk. nat commun , . kirchdoerfer, r.n., cottrell, c.a., wang, n., pallesen, j., yassine, h.m., turner, h.l., corbett, k.s., graham, b.s., mclellan, j.s., and ward, a.b. ( ). pre-fusion structure of a human coronavirus spike protein. nature , - . kirchdoerfer, r.n., wang, n., pallesen, j., wrapp, d., turner, h.l., cottrell, c.a., corbett, k.s., graham, b.s., mclellan, j.s., and ward, a.b. ( ). stabilized coronavirus spikes are resistant to conformational changes induced by receptor recognition or proteolysis. sci rep , . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / kong, r., xu, k., zhou, t., acharya, p., lemmin, t., liu, k., ozorowski, g., soto, c., taft, j.d., bailer, r.t., et al. ( ). fusion peptide of hiv- as a site of vulnerability to neutralizing antibody. science , - . ksiazek, t.g., erdman, d., goldsmith, c.s., zaki, s.r., peret, t., emery, s., tong, s., urbani, c., comer, j.a., lim, w., et al. ( ). a novel coronavirus associated with severe acute respiratory syndrome. n engl j med , - . li, h., mendelsohn, e., zong, c., zhang, w., hagan, e., wang, n., li, s., yan, h., huang, h., zhu, g., et al. ( ). human-animal interactions and bat coronavirus spillover potential among rural residents in southern china. biosaf health , - . li, w., shi, z., yu, m., ren, w., smith, c., epstein, j.h., wang, h., crameri, g., hu, z., zhang, h., et al. ( ). bats are natural reservoirs of sars-like coronaviruses. science , - . liebschner, d., afonine, p.v., baker, m.l., bunkóczi, g., chen, v.b., croll, t.i., hintze, b., hung, l.w., jain, s., mccoy, a.j., et al. ( ). macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix. acta crystallogr d struct biol , - . liu, l., wang, p., nair, m.s., yu, j., rapp, m., wang, q., luo, y., chan, j.f., sahi, v., figueroa, a., et al. ( ). potent neutralizing antibodies against multiple epitopes on sars- cov- spike. nature , - . mccoy, a.j., grosse-kunstleve, r.w., adams, p.d., winn, m.d., storoni, l.c., and read, r.j. ( ). phaser crystallographic software. j appl crystallogr , - . mclellan, j.s., chen, m., joyce, m.g., sastry, m., stewart-jones, g.b., yang, y., zhang, b., chen, l., srivatsan, s., zheng, a., et al. ( ). structure-based design of a fusion glycoprotein vaccine for respiratory syncytial virus. science , - . memish, z.a., mishra, n., olival, k.j., fagbo, s.f., kapoor, v., epstein, j.h., alhakeem, r., durosinloun, a., al asmari, m., islam, a., et al. ( ). middle east respiratory syndrome coronavirus in bats, saudi arabia. emerg infect dis , - . menachery, v.d., yount, b.l., jr., debbink, k., agnihothram, s., gralinski, l.e., plante, j.a., graham, r.l., scobey, t., ge, x.y., donaldson, e.f., et al. ( ). a sars-like cluster of circulating bat coronaviruses shows potential for human emergence. nat med , - . menachery, v.d., yount, b.l., jr., sims, a.c., debbink, k., agnihothram, s.s., gralinski, l.e., graham, r.l., scobey, t., plante, j.a., royal, s.r., et al. ( ). sars-like wiv -cov poised for human emergence. proc natl acad sci u s a , - . millet, j.k., and whittaker, g.r. ( ). murine leukemia virus (mlv)-based coronavirus spike-pseudotyped particle production and infection. bio protoc . mire, c.e., chan, y.p., borisevich, v., cross, r.w., yan, l., agans, k.n., dang, h.v., veesler, d., fenton, k.a., geisbert, t.w., et al. ( ). a cross-reactive humanized monoclonal antibody targeting fusion glycoprotein function protects ferrets against lethal nipah virus and hendra virus infection. j infect dis. mok, c.k.p., zhu, a., zhao, j., lau, e.h.y., wang, j., chen, z., zhuang, z., wang, y., alshukairi, a.n., baharoon, s.a., et al. ( ). t-cell responses to mers coronavirus infection in people with occupational exposure to dromedary camels in nigeria: an observational cohort study. lancet infect dis. mulligan, m.j., lyke, k.e., kitchin, n., absalon, j., gurtman, a., lockhart, s.p., neuzil, k., raabe, v., bailey, r., swanson, k.a., et al. ( ). phase / study to describe the safety and immunogenicity of a covid- rna vaccine candidate (bnt b ) in adults to years of age: interim report. medrxiv, . . . . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / pallesen, j., wang, n., corbett, k.s., wrapp, d., kirchdoerfer, r.n., turner, h.l., cottrell, c.a., becker, m.m., wang, l., shi, w., et al. ( ). immunogenicity and structures of a rationally designed prefusion mers-cov spike antigen. proc natl acad sci u s a , e - e . park, y.j., walls, a.c., wang, z., sauer, m.m., li, w., tortorici, m.a., bosch, b.j., dimaio, f., and veesler, d. ( ). structures of mers-cov spike glycoprotein in complex with sialoside attachment receptors. nat struct mol biol , - . pettersen, e.f., goddard, t.d., huang, c.c., couch, g.s., greenblatt, d.m., meng, e.c., and ferrin, t.e. ( ). ucsf chimera--a visualization system for exploratory research and analysis. j comput chem , - . piccoli, l., park, y.j., tortorici, m.a., czudnochowski, n., walls, a.c., beltramello, m., silacci-fregni, c., pinto, d., rosen, l.e., bowen, j.e., et al. ( ). mapping neutralizing and immunodominant sites on the sars-cov- spike receptor-binding domain by structure- guided high-resolution serology. cell , - .e . pinto, d., park, y.j., beltramello, m., walls, a.c., tortorici, m.a., bianchi, s., jaconi, s., culap, k., zatta, f., de marco, a., et al. ( ). cross-neutralization of sars-cov- by a human monoclonal sars-cov antibody. nature , - . poh, c.m., carissimo, g., wang, b., amrun, s.n., lee, c.y., chee, r.s., fong, s.w., yeo, n.k., lee, w.h., torres-ruesta, a., et al. ( ). two linear epitopes on the sars-cov- spike protein that elicit neutralising antibodies in covid- patients. nat commun , . punjani, a., rubinstein, j.l., fleet, d.j., and brubaker, m.a. ( ). cryosparc: algorithms for rapid unsupervised cryo-em structure determination. nat methods , - . raj, v.s., mou, h., smits, s.l., dekkers, d.h., muller, m.a., dijkman, r., muth, d., demmers, j.a., zaki, a., fouchier, r.a., et al. ( ). dipeptidyl peptidase is a functional receptor for the emerging human coronavirus-emc. nature , - . robert, x., and gouet, p. ( ). deciphering key features in protein structures with the new endscript server. nucleic acids res , w - . rockx, b., corti, d., donaldson, e., sheahan, t., stadler, k., lanzavecchia, a., and baric, r. ( ). structural basis for potent cross-neutralizing human monoclonal antibody protection against lethal human and zoonotic severe acute respiratory syndrome coronavirus challenge. j virol , - . rockx, b., donaldson, e., frieman, m., sheahan, t., corti, d., lanzavecchia, a., and baric, r.s. ( ). escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus. j infect dis , - . rogers, t.f., zhao, f., huang, d., beutler, n., burns, a., he, w.t., limbo, o., smith, c., song, g., woehl, j., et al. ( ). isolation of potent sars-cov- neutralizing antibodies and protection from disease in a small animal model. science. rouvinski, a., guardado-calvo, p., barba-spaeth, g., duquerroy, s., vaney, m.c., kikuti, c.m., navarro sanchez, m.e., dejnirattisai, w., wongwiwat, w., haouz, a., et al. ( ). recognition determinants of broadly neutralizing human antibodies against dengue viruses. nature , - . sahin, u., muik, a., derhovanessian, e., vogler, i., kranz, l.m., vormehr, m., baum, a., pascal, k., quandt, j., maurus, d., et al. ( ). concurrent human antibody and t_h type t-cell responses elicited by a covid- rna vaccine. medrxiv, . . . . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / scheid, j.f., mouquet, h., feldhahn, n., seaman, m.s., velinzon, k., pietzsch, j., ott, r.g., anthony, r.m., zebroski, h., hurley, a., et al. ( ). broad diversity of neutralizing antibodies isolated from memory b cells in hiv-infected individuals. nature , - . sesterhenn, f., yang, c., bonet, j., cramer, j.t., wen, x., wang, y., chiang, c.i., abriata, l.a., kucharska, i., castoro, g., et al. ( ). de novo protein design enables the precise induction of rsv-neutralizing antibodies. science . seydoux, e., homad, l.j., maccamy, a.j., parks, k.r., hurlburt, n.k., jennewein, m.f., akins, n.r., stuart, a.b., wan, y.-h., feng, j., et al. ( ). characterization of neutralizing antibodies from a sars-cov- infected individual. biorxiv, . . . . siegel, r.w. ( ). antibody affinity optimization using yeast cell surface display. methods mol biol , - . snijder, j., ortego, m.s., weidle, c., stuart, a.b., gray, m.d., mcelrath, m.j., pancera, m., veesler, d., and mcguire, a.t. ( ). an antibody targeting the fusion machinery neutralizes dual-tropic infection and defines a site of vulnerability on epstein-barr virus. immunity , - .e . song, g., he, w.-t., callaghan, s., anzanello, f., huang, d., ricketts, j., torres, j.l., beutler, n., peng, l., vargas, s., et al. ( ). cross-reactive serum and memory b cell responses to spike protein in sars-cov- and endemic coronavirus infection. biorxiv, . . . . suloway, c., pulokas, j., fellmann, d., cheng, a., guerra, f., quispe, j., stagg, s., potter, c.s., and carragher, b. ( ). automated molecular microscopy: the new leginon system. j struct biol , - . tegunov, d., and cramer, p. ( ). real-time cryo-electron microscopy data preprocessing with warp. nat methods , - . ter meulen, j., van den brink, e.n., poon, l.l., marissen, w.e., leung, c.s., cox, f., cheung, c.y., bakker, a.q., bogaards, j.a., van deventer, e., et al. ( ). human monoclonal antibody combination against sars coronavirus: synergy and coverage of escape mutants. plos med , e . tortorici, m.a., beltramello, m., lempp, f.a., pinto, d., dang, h.v., rosen, l.e., mccallum, m., bowen, j., minola, a., jaconi, s., et al. ( ). ultrapotent human antibodies protect against sars-cov- challenge via multiple mechanisms. science , - . tortorici, m.a., and veesler, d. ( ). structural insights into coronavirus entry. adv virus res , - . tortorici, m.a., walls, a.c., lang, y., wang, c., li, z., koerhuis, d., boons, g.j., bosch, b.j., rey, f.a., de groot, r.j., et al. ( ). structural basis for human coronavirus attachment to sialic acid receptors. nat struct mol biol , - . turoňová, b., sikora, m., schürmann, c., hagen, w.j.h., welsch, s., blanc, f.e.c., von bülow, s., gecht, m., bagola, k., hörner, c., et al. ( ). in situ structural analysis of sars-cov- spike reveals flexibility mediated by three hinges. science. walker, l.m., huber, m., doores, k.j., falkowska, e., pejchal, r., julien, j.p., wang, s.k., ramos, a., chan-hui, p.y., moyle, m., et al. ( ). broad neutralization coverage of hiv by multiple highly potent antibodies. nature , - . walker, l.m., phogat, s.k., chan-hui, p.y., wagner, d., phung, p., goss, j.l., wrin, t., simek, m.d., fling, s., mitcham, j.l., et al. ( ). broad and potent neutralizing antibodies from an african donor reveal a new hiv- vaccine target. science , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / walls, a.c., fiala, b., schäfer, a., wrenn, s., pham, m.n., murphy, m., tse, l.v., shehata, l., o'connor, m.a., chen, c., et al. ( a). elicitation of potent neutralizing antibody responses by designed protein nanoparticle vaccines for sars-cov- . cell , - .e . walls, a.c., park, y.j., tortorici, m.a., wall, a., mcguire, a.t., and veesler, d. ( b). structure, function, and antigenicity of the sars-cov- spike glycoprotein. cell , - .e . walls, a.c., tortorici, m.a., bosch, b.j., frenz, b., rottier, p.j.m., dimaio, f., rey, f.a., and veesler, d. ( a). cryo-electron microscopy structure of a coronavirus spike glycoprotein trimer. nature , - . walls, a.c., tortorici, m.a., frenz, b., snijder, j., li, w., rey, f.a., dimaio, f., bosch, b.j., and veesler, d. ( b). glycan shield and epitope masking of a coronavirus spike protein observed by cryo-electron microscopy. nat struct mol biol , - . walls, a.c., tortorici, m.a., snijder, j., xiong, x., bosch, b.j., rey, f.a., and veesler, d. ( ). tectonic conformational changes of a coronavirus spike glycoprotein promote membrane fusion. proc natl acad sci u s a , - . walls, a.c., xiong, x., park, y.j., tortorici, m.a., snijder, j., quispe, j., cameroni, e., gopal, r., dai, m., lanzavecchia, a., et al. ( ). unexpected receptor functional mimicry elucidates activation of coronavirus fusion. cell , - .e . wang, c., li, w., drabek, d., okba, n.m.a., van haperen, r., osterhaus, a.d.m.e., van kuppeveld, f.j.m., haagmans, b.l., grosveld, f., and bosch, b.j. ( a). a human monoclonal antibody blocking sars-cov- infection. nat commun , . wang, c., van haperen, r., gutiérrez-Álvarez, j., li, w., okba, n.m.a., albulescu, i., widjaja, i., van dieren, b., fernandez-delgado, r., sola, i., et al. ( b). isolation of cross-reactive monoclonal antibodies against divergent human coronaviruses that delineate a conserved and vulnerable site on the spike protein. biorxiv, . . . . wang, m., yan, m., xu, h., liang, w., kan, b., zheng, b., chen, h., zheng, h., xu, y., zhang, e., et al. ( ). sars-cov infection in a restaurant from palm civet. emerg infect dis , - . wang, n., li, s.y., yang, x.l., huang, h.m., zhang, y.j., guo, h., luo, c.m., miller, m., zhu, g., chmura, a.a., et al. ( ). serological evidence of bat sars-related coronavirus infection in humans, china. virol sin , - . wang, r.y., song, y., barad, b.a., cheng, y., fraser, j.s., and dimaio, f. ( ). automated structure refinement of macromolecular assemblies from cryo-em maps using rosetta. elife . wec, a.z., wrapp, d., herbert, a.s., maurer, d.p., haslwanter, d., sakharkar, m., jangra, r.k., dieterle, m.e., lilov, a., huang, d., et al. ( ). broad neutralization of sars-related viruses by human monoclonal antibodies. science. west, b.r., moyer, c.l., king, l.b., fusco, m.l., milligan, j.c., hui, s., and saphire, e.o. ( ). structural basis of pan-ebolavirus neutralization by a human antibody against a conserved, yet cryptic epitope. mbio . whittle, j.r., zhang, r., khurana, s., king, l.r., manischewitz, j., golding, h., dormitzer, p.r., haynes, b.f., walter, e.b., moody, m.a., et al. ( ). broadly neutralizing human antibody that recognizes the receptor-binding pocket of influenza virus hemagglutinin. proc natl acad sci u s a , - . wrapp, d., wang, n., corbett, k.s., goldsmith, j.a., hsieh, c.l., abiona, o., graham, b.s., and mclellan, j.s. ( ). cryo-em structure of the -ncov spike in the prefusion conformation. science , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / wu, x., yang, z.y., li, y., hogerkorp, c.m., schief, w.r., seaman, m.s., zhou, t., schmidt, s.d., wu, l., xu, l., et al. ( ). rational design of envelope identifies broadly neutralizing human monoclonal antibodies to hiv- . science , - . yang, x.l., hu, b., wang, b., wang, m.n., zhang, q., zhang, w., wu, l.j., ge, x.y., zhang, y.z., daszak, p., et al. ( ). isolation and characterization of a novel bat coronavirus closely related to the direct progenitor of severe acute respiratory syndrome coronavirus. j virol , - . yu, j., tostanoski, l.h., peter, l., mercado, n.b., mcmahan, k., mahrokhian, s.h., nkolola, j.p., liu, j., li, z., chandrashekar, a., et al. ( ). dna vaccine protection against sars- cov- in rhesus macaques. science. yuan, m., wu, n.c., zhu, x., lee, c.d., so, r.t.y., lv, h., mok, c.k.p., and wilson, i.a. ( ). a highly conserved cryptic epitope in the receptor-binding domains of sars-cov- and sars-cov. science. yuan, y., cao, d., zhang, y., ma, j., qi, j., wang, q., lu, g., wu, y., yan, j., shi, y., et al. ( ). cryo-em structures of mers-cov and sars-cov spike glycoproteins reveal the dynamic receptor binding domains. nat commun , . zaki, a.m., van boheemen, s., bestebroer, t.m., osterhaus, a.d., and fouchier, r.a. ( ). isolation of a novel coronavirus from a man with pneumonia in saudi arabia. n engl j med , - . zhang, h., wang, g., li, j., nie, y., shi, x., lian, g., wang, w., yin, x., zhao, y., qu, x., et al. ( ). identification of an antigenic determinant on the s domain of the severe acute respiratory syndrome coronavirus spike glycoprotein capable of inducing neutralizing antibodies. j virol , - . zheng, z., monteil, v.m., maurer-stroh, s., yew, c.w., leong, c., mohd-ismail, n.k., cheyyatraivendran arularasu, s., chow, v.t.k., lin, r.t.p., mirazimi, a., et al. ( ). monoclonal antibodies for the s subunit of spike of sars-cov- cross-react with the newly- emerged sars-cov- . euro surveill . zhou, p., yang, x.l., wang, x.g., hu, b., zhang, l., zhang, w., si, h.r., zhu, y., li, b., huang, c.l., et al. ( ). a pneumonia outbreak associated with a new coronavirus of probable bat origin. nature. zhou, t., georgiev, i., wu, x., yang, z.y., dai, k., finzi, a., kwon, y.d., scheid, j.f., shi, w., xu, l., et al. ( ). structural basis for broad and potent neutralization of hiv- by antibody vrc . science , - . zhu, f.c., li, y.h., guan, x.h., hou, l.h., wang, w.j., li, j.x., wu, s.p., wang, b.s., wang, z., wang, l., et al. ( a). safety, tolerability, and immunogenicity of a recombinant adenovirus type- vectored covid- vaccine: a dose-escalation, open-label, non-randomised, first-in-human trial. lancet , - . zhu, n., zhang, d., wang, w., li, x., yang, b., song, j., zhao, x., huang, b., shi, w., lu, r., et al. ( b). a novel coronavirus from patients with pneumonia in china, . n engl j med. zhu, z., dimitrov, a.s., bossart, k.n., crameri, g., bishop, k.a., choudhry, v., mungall, b.a., feng, y.r., choudhary, a., zhang, m.y., et al. ( ). potent neutralization of hendra and nipah viruses by human monoclonal antibodies. j virol , - . zivanov, j., nakane, t., forsberg, b.o., kimanius, d., hagen, w.j., lindahl, e., and scheres, s.h. ( ). new tools for automated high-resolution cryo-em structure determination in relion- . elife . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / zivanov, j., nakane, t., and scheres, s.h.w. ( ). a bayesian approach to beam-induced motion correction in cryo-em single-particle analysis. iucrj , - . zost, s.j., gilchuk, p., case, j.b., binshtein, e., chen, r.e., nkolola, j.p., schäfer, a., reidy, j.x., trivette, a., nargi, r.s., et al. ( ). potently neutralizing and protective human antibodies against sars-cov- . nature , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / extracellular endosulfatase sulf- harbours a chondroitin/dermatan sulfate chain that modulates its enzyme activity title extracellular endosulfatase sulf- harbours a chondroitin/dermatan sulfate chain that modulates its enzyme activity short title : extracellular endosulfatase sulf- is a new proteoglycan authors and affiliation rana el masri $, amal seffouh $, caroline roelants , ilham seffouh , evelyne gout , julien pérard , fabien dalonneau , kazuchika nishitsuji , fredrik noborn , mahnaz nikpour , göran larson , yoann crétinon , kenji uchimura , régis daniel , hugues lortat-jacob , odile filhol and romain r. vivès* from univ. grenoble alpes, cnrs, cea, ibs, grenoble, france, inovarion, paris, france, université paris-saclay, univ evry, cnrs, lambe, , evry-courcouronnes, france, univ-grenoble alpes, cnrs, irig - diese - cbm, cea-grenoble, grenoble, france, department of biochemistry, wakayama medical university, wakayama, - japan, department of laboratory medicine, university of gothenburg, sahlgrenska university hospital, gothenburg, sweden, univ. lille, cnrs, umr - ugsf - unité de glycobiologie structurale et fonctionnelle, f- lille, france, université grenoble alpes, inserm, cea, irig-biology of cancer and infection, umr_s , f- grenoble. $the authors contributed equally to this work *correspondence should be addressed to: romain r. vivès, ibs, avenue des martyrs cs , grenoble cedex , france. phone: (+ ) . . . . ; fax: (+ ) . . . . ; email: romain.vives@ibs.fr, and for in vivo studies to odile filhol, cea-grenoble, avenue des martyrs, grenoble cedex , france. phone: (+ ) . . . . ; fax: (+ ) . . . . ; email: odile.filhol-cochet@cea.fr. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / key words sulfatase – enzyme - glycosaminoglycan – proteoglycan – post-translational modifications author contributions relm and as performed most of the biochemical experiments, with additional contributions from eg, fd and yc. cr and relm performed in vivo experiments and data processing under the supervision of of. jp performed saxs analysis and modelling. kn and ku performed biochemical analysis of hsulf- endogenous expression. all the glycoproteomics lc-ms/ms analyses were prepared, performed and interpreted by fn, mn and gl in collaboration with the bioms proteomics core facility at the university of gothenburg. rd and is performed ms analysis. rv, of and hlj interpreted the data and supervised experimental work. rv, relm, ku and of wrote the manuscript with the help of all co-authors. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / abstract sulfs represent a class of unconventional sulfatases, which differ from all other members of the sulfatase family by their structures, catalytic features and biological functions. through their specific endosulfatase activity in extracellular milieu, sulfs provide an original post-synthetic regulatory mechanism for heparan sulfate complex polysaccharides and have been involved in multiple physiopathological processes, including cancer. however, sulfs remain poorly characterized enzymes, with major discrepancies regarding their in vivo functions. here we show that human sulf- (hsulf- ) features a unique polysaccharide post-translational modification. we identified a chondroitin/dermatan sulfate glycosaminoglycan (gag) chain, attached to the enzyme substrate binding domain. we found that this gag chain affects enzyme/substrate recognition and tunes hsulf- activity in vitro and in vivo using a mouse model of tumorigenesis and metastasis. in addition, we showed that mammalian hyaluronidase acted as a promoter of hsulf- activity by digesting its gag chain. in conclusion, our results highlight hsulf- as a unique proteoglycan enzyme and its newly- identified gag chain as a critical non-catalytic modulator of the enzyme activity. these findings contribute in clarifying the conflicting data on the activities of the sulfs and introduce a new paradigm into the study of these enzymes. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / introduction eukaryotic sulfatases have historically been defined as intracellular exoenzymes participating in the metabolism of a large array of sulfated substrates such as steroids, glycolipids, and glycosaminoglycan (gags), through hydrolysis of sulfate ester bonds under acidic conditions (hanson et al., ). however, the field took a dramatic turn two decades ago, with the discovery of the sulfs (dhoot et al., ; morimoto-tomita et al., ). unlike all other sulfatases, sulfs were shown to be extracellular endosulfatases that catalyzed the specific -o-desulfation of cell-surface and extracellular matrix heparan sulfate (hs), a polysaccharide with vast protein binding properties and biological functions (el masri et al., ; li and kusche-gullberg, ; sarrazin et al., ). and unlike all other sulfatases, sulfs could not be related to a straightforward metabolic function, but rapidly emerged as a novel major regulatory mechanism of hs biological activities, with roles in many physiopathological processes, including embryonic development, tissue homeostasis and cancer (bret et al., ; rosen and lemjabbar-alaoui, ; vives et al., ). sulfs share a common molecular organization (figure s ). the furin-processed mature form features a general sulfatase-conserved n-terminal catalytic domain (cat) including the enzyme active site (and notably, the catalytic n-formylglycine (fgly) converted cysteine residue), and a unique highly basic hydrophilic domain (hd), which shares no homology with any other known protein and is responsible for high affinity binding to hs substrates (ai et al., ; frese et al., ; seffouh et al., , a; tang and rosen, ). sulfs display a number of post-translational modifications (ptm)(morimoto- tomita et al., ). furin cleavage (tang and rosen, ) and n-glycosylations (ambasta et al., ; seffouh et al., b) may be dispensable for the enzyme activity, but play a role in the enzyme attachment to the cell surface, while conversion of c into a fgly residue is a hallmark of all sulfatases and is essential for the catalytic activity (dierks et al., ). recent studies reported that human sulfs (hsulfs) catalyzed the -o-desulfation of hs through an original, processive and orientated mechanism (seffouh et al., ), and that substrate recognition by the enzyme hd domain involved multiple, highly dynamic, non-conventional interactions (harder et al., ; walhorn et al., ). however, despite increasing interest, sulfs remain to be highly elusive enzymes. little is known about their molecular structures, catalytic mechanisms and substrate specificities. our limited understanding of these enzymes is well illustrated by the wealth of conflicting data in the literature, reporting major discrepancies between in vitro and in vivo data, according to the biological system or the enzyme isoforms considered. this is particularly clear in cancer, where both anti-oncogenic and pro-oncogenic .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / effects of the sulfs have been reported (rosen and lemjabbar-alaoui, ; vives et al., ; yang et al., ). results hsulf- is an enzyme modified with a cs/ds chain recently, we achieved for the first time high yield expression and purification of hsulf- , which paved the way to progress in the biochemical characterization of this enzyme (seffouh et al., a). surprisingly, the purification step of size-exclusion chromatography highlighted an unexpectedly high apparent molecular weight (amw) for the enzyme (> ~ kda, for a theoretical molecular weight of da, figure a), although possible protein aggregation or high order oligomerization were ruled out by quality control negative staining electron microscopy (seffouh et al., a). noteworthy, we also failed to detect the c-terminal chain containing the enzyme hd domain using page analysis (figure d, lane ), even if the presence of both chains was ascertained by edman n-terminal sequencing (seffouh et al., a). in line with this, we previously reported unusually weak mass spectrometry ionization efficiency of the hsulf- c-terminal chain (seffouh et al., b). small angle x-ray scattering (saxs) analysis of the protein yielded guinier plots and pair-distribution function in accordance with a dmax of +/- nm, suggesting an elongated molecular shape with an amw of ~ kda, which supported our size-exclusion chromatography data (figure s a-e). furthermore, results suggested the presence of two distinct domains within hsulf- : a globular domain and an extended, flexible, probably partially unfolded region. interestingly, similar analysis performed on a hd-devoid hsulf- variant (hsulf- hd) showed only the globular domain (figure s f-k), which nm size fitted that of a modelled structure of the cat domain (figure s k). however, it seemed unlikely that the hd domain on its own could account for the second, large flexible region. we thus initially speculated that the enzyme could have been purified in complex with hs substrate polysaccharide chains. to test this, hsulf- was treated with heparinases (to digest potentially bound hs substrate) or with chondroitinase abc (to digest non-substrate gags of cs/ds types) prior to size-exclusion chromatography. results showed no effect of the heparinase treatment (figure s b), while digestion with chondroitinase abc dramatically delayed hsulf- size-exclusion chromatography elution time, thus indicating the presence of cs/ds associated to the enzyme (figure b). attempts to dissociate the hsulf- -cs/ds complex with high nacl concentrations were unsuccessful (figure s c), thereby suggesting covalent linkage between .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / the polysaccharide and the protein. in addition, chondroitinase abc treatment allowed for detection of a broad additional band of ~ kda apparent molecular weight (figure d). this band was assigned to the enzyme c-terminal subunit, as confirmed by western blot (wb) analysis (figure d). of note, the hsulf- hd variant (lacking the hd but not the enzyme c-terminal region) did not exhibit such band on page (seffouh et al., a). we therefore concluded from these results that a cs/ds chain was covalently attached to the hsulf- hd domain. the presence of such a chain could account for the high amw determined by saxs and size-exclusion chromatography, and could also hinder migration/detection of the c-terminal subunit in page/wb. gags are covalently bound to specific glycoproteins termed proteoglycans (pgs), through a specific attachment site involving the serine residue of a sg dipeptide, primed by a xylose residue (esko and zhang, ). xylosides are widely used inhibitors of gag assembly on such motifs (chua and kuberan, ). as such, size-exclusion chromatography analysis of hsulf- expressed in xyloside-treated hek cells showed a dramatic reduction of the high amw form and concomitant increase of a form eluting as the chondroitinase abc-treated hsulf- (figure s d), further supporting the presence of a covalently attached gag chain. examination of hsulf- amino-acid sequence showed two sg dipeptides: s g and s g. we thus expressed and produced a hsulf- variant lacking these two motifs (hsulf- Δsg). the hsulf- Δsg variant eluted at the same time as seen in the chondroitinase abc- treated wild type (wt) hsulf- (figure c), with restored detection of the c-terminal chain by coomassie blue-stained page and wb analysis (figure d). both sg dipeptides are located within the enzyme hd domain, but on each side of the furin cleavage site (figure s ). as our page/wb data located the cs/ds chain on the c-terminal subunit, we thus speculated that the s g motif downstream the furin cleavage sites was the actual gag attachment site on hsulf- . to assess this, we performed single mutations of the first and second sites. size-exclusion analysis of the resulting variants validated the presence of a cs/ds-type gag chain on the s g, but not s g, dipeptide motif (figure s e and s f). finally, we also confirmed that the presence of n- and c-terminal tags did not bias the results, as tobacco etch virus (tev) protease digestion did not affect size-exclusion chromatography elution times of hsulf- wt, chondroitinase abc-treated hsulf- wt or hsulf- Δsg (figure s ). to characterize the hsulf- gag chain further, we analyzed both hsulf- wt and hsulf- Δsg variants by mass spectrometry. maldi-tof ms analysis of hsulf- Δsg showed a major peak at m/z , that we assigned to the doubly charged ion [m+ h] + of the whole hsulf- variant. interestingly, corresponding mono- and triple- charged ions at m/z , and , were also observed. based on this distribution of multiple charged ions, an average experimental mass value of , ± g.mol- was determined for the whole hsulf- Δsg. hsulf- Δsg thus exhibited a ~ , da lower .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / average mass value compared to hsulf- wt average molecular weight previously determined by maldi ms ( , da) (seffouh et al., b). such mass difference likely corresponds to mass contribution of the gag chain. (figure s ). covalent linkage of a ~ kda sulfated gag polysaccharide to hsulf- would result in a huge increase of the hydrodynamic volume, which is consistent with the high amw observed in size-exclusion chromatography (figure a) and in saxs analysis (figure s ). altogether, these data provide converging evidence that hsulf- features a unique ptm at the level of its hd domain, corresponding to a covalently-linked cs/ds polysaccharide chain. this result thus highlights hsulf- as a new member of the large pg family. endogenous expression of gag-modified hsulf- we identified a gag chain on hsulf- when overexpressed in hek transfected cells. to confirm the physiological relevance of these findings, we sought to demonstrate the presence of this gag chain on the naturally occurring enzyme. in that attempt, we first used a strategy originally designed to identify new proteoglycans (noborn et al., ). nano-scale liquid chromatography ms/ms analysis of trypsin- and chondroitinase abc-digested pgs isolated from the culture medium of human neuroblastoma sh- sy y cells led to the identification of a hsulf- specific, amino acid long glycopeptide highlighting a cs/ds attachment site on the s residue of hsulf- (figure s ). to get further insights into gag modification of endogenously expressed hsulf- , we analyzed hsulf- expressed by two additional cell types: mcf human breast cancer cells and human umbilical vein endothelial cells (huvecs). detection and characterization of endogenous sulfs were challenging. expression yields are usually low, and wb immunodetection yields different band patterns, depending on cells, ptms and furin cleavages (see figure s ). to address these issues, we made use of sulf high n-glycosylation content and used a protocol of culture medium enrichment based on a lectin affinity chromatography. we analyzed enriched conditioned medium by wb, using antibodies raised against either hsulf- n-terminal (h . ) (uchimura et al., ) or c-terminal ( b ) (lemjabbar-alaoui et al., ) subunits (figure s ). wb analysis of hsulf- secreted in the conditioned medium of mcf using b yielded broad diffuse bands, respectively in the ~ - and ~ - kda size range (figure a, lane ). we attributed these bands to cs/ds-conjugated c-terminal subunit fragments and to a cs/ds- conjugated full-length, furin-uncleaved hsulf- form, respectively. the presence of the cs/ds chain was confirmed by chondroitinase abc treatment, which converted the broad bands into two sets of well-defined bands, corresponding to gag-depolymerized c-terminal fragments and full-length .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / unprocessed forms (figure a, lane ). these changes could definitely be attributed to the cleavage of the gag chains, as the treatment with heat inactivated chondroitinase abc showed the same band pattern as the non-treated samples (figure a, lane ). analysis of hsulf- from huvec pre-purified conditioned medium yielded similar results (figure b), but with remarkable differences. first, detected signals were markedly less intense. although quantification by immunodetection should be cautiously considered, this suggested that expression levels of hsulf- were different between mcf and huvec cells. we also noticed discrepancies regarding furin processing activity (distinct ratios between processed and unprocessed forms). wb analysis using the n-terminal reactive h . antibody confirmed the presence of the n-terminal subunit being unaffected by the chondroitinase abc treatment (~ kda size), and supported the identification of full-length unprocessed forms within the analyzed samples (~ - kda size range, figure c and d). finally, although wb analysis showed similar band patterns for these two cell lines (figure a and b), gag-conjugated fragments from huvec cells migrated at slightly lower amw (~ - and ~ - kda, figure b, lane ). in addition, we also detected bands corresponding to gag-lacking fragments, for hsulf- from huvec at least (c-ter and unprocessed enzyme, see figure c and d). altogether, these results confirm that endogenously expressed hsulf- harbor a cs/ds chain and indicate the co-existence of gag-modified and gag-free forms. furthermore, our data suggest cell- dependent specificities of hsulf- ptm (e.g. furin processing, and the gag structure), which could provide additional regulation/diversity of the enzyme structural and functional features. hsulf- gag chain modulates enzyme activity in vitro the hd is a major functional domain of the sulfs. this domain is required for the enzyme high affinity binding to hs substrates and for processive -o-endosulfatase activity (frese et al., ; seffouh et al., ; tang and rosen, ). we thus anticipated that the presence of a gag chain on this domain would significantly affect the enzyme substrate recognition and activity. to study this, we assessed hsulf- wt and hsulf- Δsg -o-endosulfatase activities, using heparin as a surrogate of hs. we analyzed the disaccharide composition of hsulf- treated heparin and measured the content of [ua( s)-glcns( s)] trisulfated disaccharide, which is the enzyme’s primary substrate (frese et al., ; pempe et al., ; seffouh et al., ). results showed enhanced digestion of the disaccharide substrate with hsulf- Δsg versus hsulf- wt, and a concomitant increase in the [ua( s)- glcns] disaccharide product (figure a). we speculated that the observed increase in endosulfatase activity could be due to an improved enzyme-substrate interaction. we thus analyzed the binding of .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / hsulf- wt and hsulf- Δsg to surface coated biotinylated heparin. results showed an increase, although modest in hsulf- Δsg binding to heparin, with calculated kds of . ± . nm and . ± . nm for hsulf- wt and hsulf- Δsg, respectively (figure b). the second functional domain of the sulfs, cat, comprises the enzyme active site. cat alone is unable to catalyze hs -o-desulfation, but it exhibits a generic arylsulfatase activity that can be measured using the fluorogenic pseudosubstrate -methyl umbelliferyl sulfate ( -mus). surprisingly, hsulf- Δsg showed greater (~ . fold increase) arylsulfatase activity than that of hsulf- wt (figure c). we thus concluded from these observations that newly identified cs/ds chain of hsulf- regulates the enzyme activity, both by modulating hd domain/substrate interaction and by hindering access to the active site. we hypothesize that these effects could be due to electrostatic hindrance preventing the interaction of the enzyme functional domains with sulfated substrates. aside enzyme activity, the interaction of hs with the sulf hd domain is also involved in the retention of the enzyme at the cell surface, a mechanism that may also govern diffusion and bioavailability of the enzyme within tissues (frese et al., ). to investigate this, we analyzed the interaction of hsulf- wt and hsulf- Δsg with cellular hs by facs, using human amniotic epithelial wish cells as a model. again, results showed a significant increase in binding of the hsulf- form lacking the cs/ds chain to wish cells (figure d). these data therefore suggest that hsulf- gag chain may also influence enzyme retention at the cell surface. as gag-lacking hsulf- Δsg variant exhibited enhanced hs -o-endosulfatase activity, we sought to investigate whether enzymatic removal of hsulf- gag chain would lead to a similar effect. hyaluronidases are the only mammalian enzymes to exhibit chondroitinase activity (csoka et al., ; bilong m. et al., manuscript in revision). we found that treatment of hsulf- wt with hyaluronidase allowed wb detection of the ~ kda band corresponding to the enzyme c-terminal subunit (figure e), and boosted heparin -o-desulfation (figure f), with an efficiency similar to that of the hsulf- Δsg variant. hsulf- gag chain modulates tumor growth and metastasis in vivo we next investigated the effect of hsulf- gag chain on tumor progression in vivo, using a mouse xenograft model of tumorigenesis and metastasis. for this, we overexpressed by lentiviral transduction either hsulf- wt or hsulf- Δsg in mda-mb- , a human breast cancer cell line that does not express any hsulfs endogenously (peterson et al., ). after selection, stable expression of sulfs in .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / transduced cells was confirmed by wb. in contrast, cells transduced with an unrelated protein (dsred) showed no endogenous expression of the sulfs (figure s a). we also validated the endosulfatase activity of hsulf- produced in mda-mb- , by treating heparin with concentrated conditioned medium from transduced cells. results showed no activity for the medium of dsred-transfected cells, while conditioned medium from either hsulf- wt or hsulf- Δsg transduced cells efficiently digested heparin, as shown by the increase of [ua( s)-glcns] disaccharide product (figure s b). again, results suggested higher endosulfatase activity for hsulf- Δsg transduced cells. finally, we confirmed the presence of the cs/ds chain on mda-mb- hsulf- wt, by treating conditioned medium with chondroitinase abc, followed by wb analysis (figure s c). of note, results also showed a significant proportion of gag-free and full-length, unprocessed forms of hsulf- in the chondroitinase abc-untreated conditioned medium (figure s c, lane ). dsred, hsulf- wt or hsulf- Δsg transduced mda-mb- cells were then xenografted into the mammary gland of mice with severe combined immunodeficiency (scid). tumor volumes were monitored every days and xenografted scid mice were euthanized when the first tumors reached cm in size (day ), in accordance with the european ethical rule on animal experimentation. primary tumors, along with lymph nodes and lungs, were collected for further analysis. results showed little effects of hsulf- wt expression on the tumor size (figure a). our data are therefore in disagreement with previous work, which reported either anti-oncogenic (peterson et al., ) or pro-oncogenic (zhu et al., ) effects of hsulf- wt expression in mda-mb- cells using similar in vivo mouse models. however, it should be noted that a major difference between these three studies is the size of xenograft tumors achieved (~ . cm and - cm respectively, in the studies mentioned above). such conflicting data clearly exemplify the complexity of hsulf regulatory functions and possible bias, which could result from the experimental design. in contrast, expression of the hsulf- Δsg variant significantly promoted tumor growth, in comparison to both dsred and hsulf- wt conditions. noteworthy, wb analysis of tumors confirmed sustained expression of the enzyme in both hsulf- wt and hsulf- Δsg -but not dsred- conditions (supp. figure s d). histological analysis of tumor sections using an eosin/hematoxylin staining showed greater necrotic area in control tumors than in hsulf- expressing tumors (figure b). as necrosis is a hallmark of hypoxia in growing tumors that is mainly due to lack of angiogenesis, we studied tumor vascularization using α smooth muscle actin (αsma) immunostaining. results showed no apparent changes in αsma labelling upon hsulf- wt expression. however, tumor vascularization was increased in hsulf- Δsg tumors (figure s a and s b). we next analyzed lymph nodes and lungs for secondary tumors. lung, which is a primary target for metastasis .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / in this tumor model, was affected in all conditions (figure c). however, size of metastasis-induced secondary tumors was significantly greater in hsulf- Δsg expressing tumors (figure d and e). moreover, tumor metastasis could be observed with higher frequency in lateral (left axillary ln) but also contra-lateral (right axillary ln) lymph nodes for hsulf- expressing tumors (figures s c). the cs/ds chain borne by hsulf- is thus functionally relevant in vivo, at least in the context of cancer, where it attenuated the effect of the enzyme on tumor growth and metastatic invasion. in contrast, forms of hsulf- lacking the cs/ds chain stimulate the metastatic properties of cancer cells, thus highlighting the importance of hsulf- gag modification status for considering the enzyme as a potential therapeutic target for treating human cancer. discussion in this study, we have shown the presence of a covalently-linked cs/ds chain on extracellular sulfatase hsulf- , and demonstrated its functional relevance. although cs/ds chains have been previously identified on the mucin-like domain of adamts (mead et al., ), the identification of hsulf- as a new secreted pg is unprecedented. it is well established that gag chains provide most of pg’s biological activities, usually through the ability of the polysaccharide to bind and modulate a wide array of structural and signaling proteins. however, we show here that the gag chain present on hsulf- directly modulates its enzyme activity. these findings open new and unexpected perspectives in the understanding of the enzyme biological functions, and should contribute to clarify discrepancies in the literature. here, we first demonstrated that the cs/ds chain modulates hsulf- -o-endosulfatase activity in vitro, most likely by competing with sulfated substrates for hs binding site occupancy, and/or through electrostatic hindrance. in support to this, we located this gag chain on the hd domain of hsulf- , which is critical for substrate binding. however, in an in vivo biological context, we anticipate that hsulf- gag chain could also modulate the enzyme function through other mechanisms. gags bind a wide array of cell-surface and extracellular matrix proteins. the hsulf- cs/ds chain could therefore promote the recruitment of gag-binding proteins, with potentially significant functional consequences. these interactions may involve hsulf- in the regulation of matricrin signaling activities, or influence the diffusion and distribution of the enzyme within tissues. in line with this, our facs- based cell-binding assay suggested enhanced attachment to the cell surface of the hsulf- Δsg variant vs the hsulf- wt form. consequently, in vivo hsulf- “gagosylation” status may not only influence the extent of hs -o-desulfation, but also the range of the enzyme activity and access to specific hs subsets in tissues. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / we thus next analyzed the effect of hsulf- gag chain in an in vivo mouse xenograft model of cancer our data showed that overexpression of the hsulf- Δsg variant with enhanced activity in vitro promoted significantly tumor growth, vascularization and metastasis in vivo. the development of metastasis is a multistep process, which is a major factor of poor prognosis in cancer. although the biological mechanisms that drive metastasis are relatively unknown, the role of cancer cell-derived matrisome proteins as prometastatic has been recently highlighted (tian et al., ). based on our data, we speculate that hsulf- may also participate to the extracellular cellular matrix remodeling process, and could provide an additional target to act on metastasis development. furthermore, hsulf- “gagosylation” status serve as a new metastatic promoting marker. beyond the field of cancer, this concept of “gagosylation” status should prove to be critical for studying the biological functions of the sulfs, as this may confer to the enzyme a tremendous level of functional and structural heterogeneity. it is first well known that the structure and binding properties of gags vary according to the biological context. we therefore anticipate further regulation of hsulf- catalytic activity and/or diffusion properties, depending on structural features of its cs/ds chain. in addition, our data highlighted differences in hsulf- furin processing amongst analyzed cell types. this could be functionally relevant, as furin maturation may affect hsulf- cell surface/extracellular localization as well as in vivo activity (tang and rosen, ). as gags have been previously shown to promote furin activity (pasquato et al., ), we could thus hypothesize that the presence of hsulf- newly identified cs/ds chain at the vicinity of the two major furin cleavage sites may also influence hsulf- maturation status. here, we used a mutagenesis generated gag-lacking hsulf- variant in our functional assays. however, our data suggest the co-existence of both gag-conjugated and gag-free hsulf- , as page analysis of gag conjugated hsulf- fragments from mcf and huvecs yielded distinctly different band patterns (figure ). the balance of expression between these two forms may therefore be critical for the control of hs -o-desulfation process. the underlying mechanisms are likely to be complex and multifactorial. interestingly, we showed that hyaluronidases could efficiently digest hsulf- gag chain and enhance its endosulfatase activity (figure e and f). mammalian hyaluronidases are a family of enzymes that catalyze the degradation of hyaluronic acid (ha) and also exhibit the ability to depolymerize cs (csoka et al., ; jedrzejas and stern, ; kaneiwa et al., ). hyaluronidase expression is increased in some cancers (mcatee et al., ), with suggested roles in tumor invasion and tumor-associated inflammation (dominguez gutierrez et al., ; mcatee et al., ). however, their precise contribution remains poorly understood and contradictory. here, we propose a new function for these .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / enzymes, which may provide an activating mechanism of hsulf- , through their ability to “unleash” the enzyme from its gag chain. this perspective urges to investigate in detail the interplay of hsulf- and hyaluronidases during tumor progression as well as in other physiopathological conditions. in line with this, we have analyzed in details the molecular features of hsulf- gag chain and the effects of hyaluronidase on its structure and activity (seffouh et al., manuscript in preparation). last but not least, analysis of the other human isoform hsulf- showed an absence of any gag chain, at least in our hek overexpressing system (figure s g). “gagosylation” status could thus account for the functional differences reported between these two secreted endosulfatases. in conclusion, we report here a most unexpected ptm of hsulf- , by identifying the presence of a cs/ds chain on the enzyme. our data highlight this gag chain as a novel non-catalytic regulatory element of hsulf- activity, and pave the way to new directions in the study of this highly intriguing enzyme and complex regulatory mechanism of hs activity. finally, it is worth noting that such a structurally and functionally relevant feature as a gag chain on hsulf- has remained overlooked for more than years. beyond the field of the sulfs, our findings therefore strongly encourage reconsidering afresh the importance of ptms in complex enzymatic systems. material and methods antibodies against hsulf‑ the epitopes of antibodies against hsful- are summarized in fig. s . polyclonal antibody h was newly produced by biotem (apprieu, france), by immunizing rabbits with a mix of peptides derived from hsulf- sequence (c dsgdyklslagrrkklf and t krhwpgapedqddkdg), located with the hd domain, on each side of the furin cleavage site (see fig. s ). consequently, h is specific of the hd domain and recognizes both hsulf- n- and c-terminal subunits. the b monoclonal antibody, which is specific of hsulf- c-terminal subunit, was purchased from r&dsystems (mab ). of note, analysis of whole lysates prepared from cultured cells or tissues with b yields a sharp ~ kda band. this band presumably corresponds to a form in synthesis, such as non furin-processed/gag- unmodified hsulf- . meanwhile, analysis of conditioned medium with b shows multiple bands corresponding to hsulf- unmodified or furin/gag-modified c-terminal subunit. the h . polyclonal antibody, which is specific to the hsulf- n-terminal subunit, was previously described (uchimura et al., ). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / production and purification of recombinant wt and mutant hsulf‑ the expression and purification of hsulf- and mutants were performed as described previously (seffouh et al., a). ). briefly, freestyle hek cells (thermo fisher scientific) were transfected with pcdna . vector encoding for hsulf- cdna flanked by tev cleavable snap ( . kda) and his tags at n- and c-terminus, respectively. the protein was purified from conditioned medium by cation exchange chromatography on a sp sepharose column (ge healthcare) in mm tris, mm mgcl , mm cacl , ph . , using a . - m nacl gradient, followed by size exclusion chromatography (superdex , ge healthcare) in mm tris, mm nacl, mm mgcl , mm cacl , ph . treatment of hsulf- with chondroitinase abc was achieved by incubating µg of enzyme with mu chondroitinase abc (sigma) overnight at °c. hsulf- Δsg mutants (Δsg, Δsg , Δsg ) were generated by site directed mutagenesis (isbg robiomol platform) and purified as above. analysis of hsulf- expression mda-mb- cells were lysed with ripa buffer for h at °c and tissues were disrupted and lysed in ripa buffer (sigma-aldrich) using a magna lyser instrument (roche) with ceramic beads. supernatants were collected and protein concentration was determined using a bca protein assay kit (thermo scientific). cell lysates (eq. of . cells), tumor lysates (eq. of µg of total proteins) or purified recombinant proteins were then separated by sds-page, followed by transfer onto pvdf membrane. proteins were probed using either rabbit polyclonal h (dil. / ) or mouse monoclonal b (dil. / ) antibodies, followed by incubation with hrp-conjugated anti-rabbit (thermo scientific, dil. / ), anti-mouse (thermo scientific, dil. / ) secondary antibodies. endogenous cs/ds modification of hsulf- was analyzed in two cell lines: the mcf- human breast cancer cells and human umbilical vein endothelial cells (huvecs). mcf- cells were cultured at °c for h in opti-mem, after which culture medium was collected and concentrated on amicon ultra filters ( kda cut-off, millipore, burlington, ma). conditioned medium from mda-mb cells was prepared likewise, using freestyle medium instead of opti-mem. hsulf- in concentrated samples were analyzed by western blotting as described below. huvecs were cultured at °c in opti-mem containing . % fbs for h, after which culture medium was collected and concentrated on amicon ultra filters. concentrated samples were incubated with glcnac-binding wheat germ agglutinin (wga)-coated beads (vector laboratories, burlingame, ca) at °c overnight, and proteins that were captured by wga beads were analyzed by western blotting. for elimination of cs/ds chains, the concentrated mcf- culture media or wga bead-bound materials were treated with chondroitinase .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / abc ( u/ml) or heat-inactivated chondroitinase abc ( u/ml), at °c for h. proteins in the samples were separated by sds-page with – % gels (wako pure chemical industries, osaka, japan) and were transferred to pvdf membranes. hsulf- proteins were probed with the b mouse monoclonal anti- hsulf- antibody (dil. / ) or the h . rabbit polyclonal anti-hsulf- antibody (dil. / ) followed by a horseradish peroxidase-labeled anti-mouse or rabbit igg antibody (cell signaling technology, danvers, ma) and immunostar ld (wako pure chemical industries). signals were visualized by using a luminograph image analyzer (atto, tokyo, japan). saxs analysis saxs data were collected at the european synchrotron radiation facility (grenoble, france) on the bm beamline at biosaxs. the standard energy as set to . kev and a pilatus m detector was used to record the scattering patterns. the sample-to-detector distance was set to . m (q-range is . - nm- ). samples were set in quartz glass capillary with an automated sample changer. the scattering curve of the buffer (before and after) solution was subtracted from the sample’s saxs curves. scattering profiles were measured at several concentrations, from . to . mg/ml at room temperature. data were processed using standard procedures with the atsas v . . suite of programs (petoukhov et al., ). the ab initio determination of the molecular shape of the proteins was performed as previously described (pérard et al., ). radius of gyration (rg) and forward intensity at zero angle (i( )) were determined with the programs primus (konarev et al., ), by using the guinier approximation at low q value, in a q.rg range < . : 𝑙𝑛𝐼(𝑄) = 𝑙𝑛 𝐼 ( ) − 𝑅 𝑄 porod volumes and kratky plot were determined using the guinier approximation and the primus programs. the pairwise distance distribution function p(r) were calculated by indirect fourier transform with the program gnom (svergun, ). the maximum dimension (dmax) value was adjusted in order that the rg r value obtained from gnom agreed with that obtained from guinier analysis. in order to build ab initio models, several independent dammif (franke and svergun, ) models were calculated in slow mode with pseudo chain option and merged using the program damaver (konarev et al., ). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / maldi-tof ms analysis of hsulf- Δsg ms experiments were carried out on a maldi autoflex speed tof/tof ms instrument (bruker daltonics, germany), equipped with a smartbeam ii™ laser pulsed at  khz. the spectra were recorded in the positive linear mode (delay:  ns; ion source (is ) voltage: .  kv; ion source (is ) voltage: .  kv; lens voltage: .  kv). maldi data acquisition was carried out in the mass range –  da, and shots were summed for each spectrum. mass spectra were processed using flexanalysis software (version . . . , bruker daltonics). the instrument was calibrated using mono- and multi-charged ions of bsa (bsa calibration standard kit, ab sciex, france). hsulf- Δsg was desalted as previously described (seffouh et al., b). maldi-tof ms analysis was achieved by mixing .  μl of sinapinic acid matrix at  mg/ml in acetonitrile/water ( / ; v/v), . % tfa, with .  μl of the desalted protein solution ( .  mg/ml). lc-ms/ms identification of hsulf- gag chain and its attachment site the glycoproteomics protocol used for characterizing proteoglycans has been published earlier(noborn et al., ) and most recently summarized in detail for analyses of cs proteoglycans of human cerebrospinal fluid (noborn et al methods in molecular biology, in press). in the present work, the starting material was conditioned cell media, without fetal calf serum, obtained from sh- sy y cells kindly provided by drs. thomas daugbjerg-madsen and katrine schjoldager, university of copenhagen, denmark. in vitro enzyme activity assays detailed protocols for arylsulfatase and endosulfatase assays have been described elsewhere (seffouh et al., a). for the arylsulfatase assay, the enzyme ( µg) was incubated for h with mm mus (sigma) in mm tris, mm mgcl ph . for - h at °c, and the reaction was followed by fluorescence monitoring (excitation nm, emission nm). results are expressed as a fold of fluorescence increase compared to negative control ( mus alone) and corresponds to means +/- sd of three independent experiments. the endosulfatase assay was achieved by incubating µg of heparin with µg of enzymes in mm tris, . mm mgcl ph . , for h at °c. disaccharide composition of sulf-treated heparin was then determined by exhaustive digestion of the polysaccharide ( hours at °c) with a cocktail of heparinase i, ii and iii ( mu each), followed by rpip-hplc analysis (henriet et al., ), using nacl gradients calibrated with authentic standards .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / (iduron). hsulf- ( µg) digestion with hyaluronidase (sigma/aldrich, µg) was achieved after a h incubation in mm tris ph . at °c, prior to the endosulfatase assay, which was performed as above. incubation of heparin with hyaluronidase alone showed no effect on the disaccharide analysis (data not shown). analysis of hsulf- /heparin binding immuno-assay as reported before (seffouh et al., a), microliter plates were first coated with µg/ml streptavidin in tbs buffer, then incubated with µg/ml biotinylated heparin, and saturated with % bsa. all the incubations were achieved for h at rt, in mm tris-cl, mm nacl, ph . (tbs) buffer. next, the recombinant protein was added, then probed with h . primary rabbit polyclonal anti-hsulf- antibody (dil. / ) followed by fluorescent-conjugated secondary anti-rabbit antibody (jackson immunoresearch laboratories, dil. / ). all the incubations were performed for h at °c in tbs, . % tween, and were separated by extensive washes with tbs, . % tween. finally, fluorescence of each well was measured (excitation nm, emission nm). kds were determined by scatchard analysis of the binding data. results shown are representative of three independent experiments. facs analysis wish cells ( million for each condition) were washed with pbs, % bsa (the same buffer is used all along the experiment), and incubated with µg of hsulf- enzymes ( h at °c). after extensive washing, cells were incubated with h . primary antibody (dil. / , h at °c), washed again, then with secondary alexa -conjugated antibody (jackson immunoresearch laboratories, dil. / , h at °c). facs analysis of cell fluorescence was performed on a macsquant analyzer (miltenyi biotec, excitation nm, emission nm) by calculating median over events. data are represented as means +/- sd of three independent experiments. lentiviral transduction of mda-mb cells. hsulf- (wt and variant) encoding cdnas were cloned into the plvx lentiviral vector (clonetech). this vector was then used in combination with viral vectors gag pol (pspax ) and env vsv-g (pcmv) to transduce hek t and produce lentiviruses released in the extracellular medium. the plvx-ds-red n (clonetech) was used as negative control. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / mda-mb- cells were purchased from atcc and were cultured in leibovitz’s medium (life technologies) supplemented with % fetal bovine serum, u/ml of penicillin, µg/ml of streptomycin (life technologies). for infection, mda-mb- cells were plated into well-plates ( x in ml of serum-supplemented leibovitz’s medium). the day after, adherent cells were incubated with ml of lentiviral medium diluted in ml of serum-supplemented medium containing μg/μl of polybrene (sigma/aldrich). after h, ml of medium were added to cultures and transduction was maintained for h before washing the cells and changing the medium. for stable transduction, puromycin selection was started h post-infection (at the concentration of μg/ml, life technologies) and was maintained thereafter. in vivo experiments in vivo experiment protocols were approved by the institutional guidelines and the european community for the use of experimental animals. -weeks-old female nod scid gamma/j mice were purchased from charles river and maintained in the animal resources centre of our department. x mda-mb- cells resuspended in % matrigeltm (becton dickinson) in leibovitz medium (life technologies) were injected into the fat pad of # left mammary gland. tumor growth was recorded by sequential determination of tumor volume using caliper. tumor volume was calculated according to the formula v = ab²/ (a, length; b, width). mice were euthanized after days through cervical dislocation. tumors and axillary lymph nodes were collected, weighed and either fixed for h in % paraformaldehyde (pfa) and embedded in paraffin, or stored at - °c for wb analysis. tissue necrosis was assessed by hematoxylin/eosin staining and imagej quantification. for vascularization analysis, sections ( μm thick) of formalin-fixed, paraffin embedded tumor tissue samples were dewaxed, rehydrated and subjected to antigen retrieval in citrate buffer (antigen unmasking solution, vector laboratories) with heat. slides were incubated for min in hydrogen peroxide h o to block endogenous peroxidases and then min in saturation solution (histostain, invitrogen) to block non- specific antibody binding. this was followed by overnight incubation, at °c, with primary antibody against αsma (ab , abcam, dil. / ). after washing, sections were incubated with a suitable biotinylated secondary antibody (histostain kit, invitrogen) for min. antigen-antibody complexes were visualized by applying a streptavidin-biotin complex (histostain, invitrogen) for min followed by novared substrate (vector laboratories). sections were counterstained with hematoxylin to visualize nucleus. control sections were incubated with secondary antibody alone. lungs were inflated using % pfa and embedded in paraffin. the metastatic burden was assessed by serial sectioning of the entire lungs, every µm. hematoxylin and eosin staining was performed on lung and lymph .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / nodes sections ( µm thick). images were acquired using axioscan z (zeiss) slide scanner and quantified using fiji software. statistical analysis experimental data are shown as mean ± standard error of the mean (sem) unless specified otherwise. comparisons between multiple groups were carried out by a repeated-measures two-way analysis of variance (anova) with tukey’s multiple comparisons test to evaluate the significance of differential tumor growth between three groups of mice; an ordinary two-way anova with bonferroni’s test and an ordinary one-way anova were carried out to evaluate in vitro activity and binding of hsulf- , the differential level of necrosis and vascularization inside tumors, and pulmonary metastases (number and area). prism (graphpad software, inc., ca) was used for analyses. probability value of less than . was considered to be significant. * p < . , ** p < . , *** p < . and **** p < . . acknowledgments the authors would like to thank the animal unit staff (jeannin i., bama s., magallon c., chaumontel n. and pointu h.) at the interdiciplinary research institute of grenoble (irig) for animal husbandry. this work used the platforms of the grenoble instruct-eric center (isbg; ums cnrs-cea-ujf-embl) within the grenoble partnership for structural biology (psb). platform access was supported by frisbi (anr- -inbs- - ) and gral, a project of the university grenoble alpes graduate school (ecoles universitaires de recherche) cbh-eur-gs (anr- -eure- ). this work was also supported by the cnrs and the gdr gag (gdr ), the “investissements d’avenir” program glyco@alps (anr- - idex- ), by grants from the agence nationale de la recherche (anr- -bsv - and anr- -ce - ) and université grenoble-alpes (uga agir program), the swedish research council ( - to gl and to the swedish national infrastructure for biological mass spectrometry (bioms)), and the inga-britt and arne lundbergs forskningsstiftelse. ibs acknowledges integration into the interdisciplinary research institute of grenoble (irig, cea). references ai, x., do, a.t., kusche-gullberg, m., lindahl, u., lu, k., and emerson, c.p., jr. ( ). substrate specificity and domain functions of extracellular heparan sulfate -o-endosulfatases, qsulf and qsulf . j biol chem , – . .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ambasta, r.k., ai, x., and emerson, c.p., jr. ( ). quail sulf function requires asparagine-linked glycosylation. j biol chem , – . bret, c., moreaux, j., schved, j.f., hose, d., and klein, b. ( ). sulfs in human neoplasia: implication as progression and prognosis factors. j transl med , . chua, j.s., and kuberan, b. ( ). synthetic xylosides: probing the glycosaminoglycan biosynthetic machinery for biomedical applications. acc. chem. res. , – . csoka, a.b., frost, g.i., and stern, r. ( ). the six hyaluronidase-like genes in the human and mouse genomes. matrix biol. , – . dhoot, g.k., gustafsson, m.k., ai, x., sun, w., standiford, d.m., and emerson, c.p., jr. ( ). regulation of wnt signaling and embryo patterning by an extracellular sulfatase. science , – . dierks, t., dickmanns, a., preusser-kunze, a., schmidt, b., mariappan, m., von figura, k., ficner, r., and rudolph, m.g. ( ). molecular basis for multiple sulfatase deficiency and mechanism for formylglycine generation of the human formylglycine-generating enzyme. cell , – . dominguez gutierrez, p.r., kwenda, e.p., donelan, w., o’malley, p., crispen, p.l., and kusmartsev, s. ( ). hyal expression in tumor-associated myeloid cells mediates cancer-related inflammation in bladder cancer. cancer res. el masri, r., seffouh, a., lortat-jacob, h., and vivès, r.r. ( ). the “in and out” of glucosamine -o- sulfation: the th sense of heparan sulfate. glycoconj. j. , – . esko, j.d., and zhang, l. ( ). influence of core protein sequence on glycosaminoglycan assembly. curr. opin. struct. biol. , – . franke, d., and svergun, d.i. ( ). dammif, a program for rapid ab-initio shape determination in small-angle scattering. j. appl. crystallogr. , – . frese, m.a., milz, f., dick, m., lamanna, w.c., and dierks, t. ( ). characterization of the human sulfatase sulf and its high affinity heparin/heparan sulfate interaction domain. j biol chem , – . hanson, s.r., best, m.d., and wong, c.h. ( ). sulfatases: structure, mechanism, biological activity, inhibition, and synthetic utility. angew chem int ed engl , – . harder, a., möller, a.-k., milz, f., neuhaus, p., walhorn, v., dierks, t., and anselmetti, d. ( ). catch bond interaction between cell-surface sulfatase sulf and glycosaminoglycans. biophys. j. , – . henriet, e., jäger, s., tran, c., bastien, p., michelet, j.-f., minondo, a.-m., formanek, f., dalko-csiba, m., lortat-jacob, h., breton, l., et al. ( ). a jasmonic acid derivative improves skin healing and induces changes in proteoglycan expression and glycosaminoglycan structure. biochim. biophys. acta , – . jedrzejas, m.j., and stern, r. ( ). structures of vertebrate hyaluronidases and their unique enzymatic mechanism of hydrolysis. proteins struct. funct. bioinforma. , – . .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / kaneiwa, t., mizumoto, s., sugahara, k., and yamada, s. ( ). identification of human hyaluronidase- as a novel chondroitin sulfate hydrolase that preferentially cleaves the galactosaminidic linkage in the trisulfated tetrasaccharide sequence. glycobiology , – . konarev, p.v., volkov, v.v., sokolova, a.v., koch, m.h.j., and svergun, d.i. ( ). primus: a windows pc-based system for small-angle scattering data analysis. j. appl. crystallogr. , – . lemjabbar-alaoui, h., van zante, a., singer, m.s., xue, q., wang, y.q., tsay, d., he, b., jablons, d.m., and rosen, s.d. ( ). sulf- , a heparan sulfate endosulfatase, promotes human lung carcinogenesis. oncogene , – . li, j.-p., and kusche-gullberg, m. ( ). heparan sulfate: biosynthesis, structure, and function. int. rev. cell mol. biol. , – . mcatee, c.o., barycki, j.j., and simpson, m.a. ( ). emerging roles for hyaluronidase in cancer metastasis and therapy. adv. cancer res. , – . mead, t.j., mcculloch, d.r., ho, j.c., du, y., adams, s.m., birk, d.e., and apte, s.s. ( ). the metalloproteinase-proteoglycans adamts and adamts provide an innate, tendon-specific protective mechanism against heterotopic ossification. jci insight . morimoto-tomita, m., uchimura, k., werb, z., hemmerich, s., and rosen, s.d. ( ). cloning and characterization of two extracellular heparin-degrading endosulfatases in mice and humans. j biol chem , – . noborn, f., gomez toledo, a., sihlbom, c., lengqvist, j., fries, e., kjellen, l., nilsson, j., and larson, g. ( ). identification of chondroitin sulfate linkage region glycopeptides reveals prohormones as a novel class of proteoglycans. mol cell proteomics , – . pasquato, a., dettin, m., basak, a., gambaretto, r., tonin, l., seidah, n.g., and di bello, c. ( ). heparin enhances the furin cleavage of hiv- gp peptides. febs lett , – . pempe, e.h., burch, t.c., law, c.j., and liu, j. ( ). substrate specificity of -o-endosulfatase (sulf- ) and its implications in synthesizing anticoagulant heparan sulfate. glycobiology , – . pérard, j., nader, s., levert, m., arnaud, l., carpentier, p., siebert, c., blanquet, f., cavazza, c., renesto, p., schneider, d., et al. ( ). structural and functional studies of the metalloregulator fur identify a promoter-binding mechanism and its role in francisella tularensis virulence. commun. biol. , . peterson, s.m., iskenderian, a., cook, l., romashko, a., tobin, k., jones, m., norton, a., gomez-yafal, a., heartlein, m.w., concino, m.f., et al. ( ). human sulfatase inhibits in vivo tumor growth of mda-mb- human breast cancer xenografts. bmc cancer , . petoukhov, m.v., franke, d., shkumatov, a.v., tria, g., kikhney, a.g., gajda, m., gorba, c., mertens, h.d.t., konarev, p.v., and svergun, d.i. ( ). new developments in the atsas program package for small-angle scattering data analysis. j. appl. crystallogr. , – . rosen, s.d., and lemjabbar-alaoui, h. ( ). sulf- : an extracellular modulator of cell signaling and a cancer target candidate. expert opin ther targets , – . .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / sarrazin, s., lamanna, w.c., and esko, j.d. ( ). heparan sulfate proteoglycans. cold spring harb perspect biol . seffouh, a., milz, f., przybylski, c., laguri, c., oosterhof, a., bourcier, s., sadir, r., dutkowski, e., daniel, r., van kuppevelt, t.h., et al. ( ). hsulf sulfatases catalyze processive and oriented -o- desulfation of heparan sulfate that differentially regulates fibroblast growth factor activity. faseb j , – . seffouh, a., el masri, r., makshakova, o., gout, e., hassoun, z.e.o., andrieu, j.-p., lortat-jacob, h., and vivès, r.r. ( a). expression and purification of recombinant extracellular sulfatase hsulf- allows deciphering of enzyme sub-domain coordinated role for the binding and -o-desulfation of heparan sulfate. cell. mol. life sci. cmls , – . seffouh, i., przybylski, c., seffouh, a., el masri, r., vivès, r.r., gonnet, f., and daniel, r. ( b). mass spectrometry analysis of the human endosulfatase hsulf- . biochem. biophys. rep. , . svergun, d.i. ( ). determination of the regularization parameter in indirect-transform methods using perceptual criteria. j. appl. crystallogr. , – . tang, r., and rosen, s.d. ( ). functional consequences of the subdomain organization of the sulfs. j biol chem , – . tian, c., Öhlund, d., rickelt, s., lidström, t., huang, y., hao, l., zhao, r.t., franklin, o., bhatia, s.n., tuveson, d.a., et al. ( ). cancer cell-derived matrisome proteins promote metastasis in pancreatic ductal adenocarcinoma. cancer res. , – . uchimura, k., morimoto-tomita, m., bistrup, a., li, j., lyon, m., gallagher, j., werb, z., and rosen, s.d. ( ). hsulf- , an extracellular endoglucosamine- -sulfatase, selectively mobilizes heparin- bound growth factors and chemokines: effects on vegf, fgf- , and sdf- . bmc biochem , . vives, r.r., seffouh, a., and lortat-jacob, h. ( ). post-synthetic regulation of hs structure: the yin and yang of the sulfs in cancer. front oncol , . walhorn, v., möller, a.-k., bartz, c., dierks, t., and anselmetti, d. ( ). exploring the sulfatase catch bond free energy landscape using jarzynski’s equality. sci. rep. , . yang, j.d., sun, z., hu, c., lai, j., dove, r., nakamura, i., lee, j.s., thorgeirsson, s.s., kang, k.j., chu, i.s., et al. ( ). sulfatase and sulfatase in hepatocellular carcinoma: associated signaling pathways, tumor phenotypes, and survival. genes. chromosomes cancer , – . zhu, c., he, l., zhou, x., nie, x., and gu, y. ( ). sulfatase promotes breast cancer progression through regulating some tumor-related factors. oncol. rep. , – . .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / figures fig. : purification and characterization of hsulf- and hsulf- Δsg size exclusion chromatography profile of hsulf- wt (a), chondroitinase abc pre-treated hsulf- (b) and hsulf- Δsg (c) ; grey bars indicate sulf-containing fractions. (d) page/coomassie blue staining and western blot analysis of hsulf- wt (wt, lanes ), hsulf- Δsg (Δsg, lanes ) and chondroitinase .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / abc pre-treated hsulf- (cs, lane ), using the anti hd h antibody. analysis shows a ~ kda band corresponding to hsulf- n-terminal subunit in fusion with the snap-tag (snap-nter) and multiple/broad ~ kda bands corresponding to the c-terminal subunit, which includes hsulf- hd domain (cter). of note, a residual ~ kda band corresponding to the n-terminal subunit lacking its snap-tag could also be detected (nter). in addition, coomassie blue staining but not wb, revealed the presence of a full-length, unprocessed gag-free hsulf- form (snap-unprocessed). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. : endogenous expression of gag bearing hsulf- in mcf and huvec cells western blot analysis of pre-purified concentrated conditioned medium from mcf (a, c) and huvec (b, d) using anti c-ter b (a, b) and anti n-ter h . antibodies (c, d), prior to ( , lanes ) or after treatment with chondroitinase abc (cs, lanes ). digestions with heat-inactivated chondroitinase abc were used as controls (cs inac., lanes ). the nature of detected bands is shown as follow: black .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / symbols for cs/ds conjugated fragments ; white symbols for gag-free fragments ; triangles for the n- terminal subunit, squares for the c-terminal subunit. of note, analysis indicate the presence in both samples of unprocessed forms (triangle + square, sharp band at ~ kda), and at least in the huvec conditioned medium, the presence of gag-free hsulf- forms (bands corresponding to c-ter fragments within the - kda mw range, and an unprocessed form at kda detected in the untreated samples, gel b lanes and ). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. : biological activities of hsulf- wt and gag-free hsulf- . (a) hs -o-endosulfatase activity of hsulf- wt (black symbols) and hsulf- Δsg (white symbols) was assessed by monitoring the time course digestion of [ua( s)-glcns( s)] trisulfated disaccharides .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / (ns s s, square) into ua( s)-glcns] disulfated disaccharides (ns s, circle). data are expressed as a percentage of total disaccharide content. ordinary two-way anova with time of incubation and type of hsulf- as factors revealed significant effects on -o-endosulfatase activity (time: f , = . , p < . ; hsulf- type: f , = . , p < . ; interaction: f , = . , p < . ) (left panel) and a concomitant increase in the digested product (time: f , = , p < . ; hsulf- type: f , = , p < . ; interaction: f , = . , p < . ) (right panel). post-hoc bonferroni’s test showed significant difference in the hs -o-endosulfatase activity at h incubation and thereafter until h in hsulf- Δsg (n= ) compared with hsulf- wt (n= ). error bars indicate sd. (b) binding immunoassay of hsulf- wt (black) and hsulf- Δsg (white) to a streptavidin-immobilized heparin surface. data are representative of three independent experiments. (c) the aryl-sulfatase activity of hsulf- wt (n= , black) and hsulf- Δsg (n= , white) was measured using mus fluorogenic pseudo-substrate. results are expressed as a fluorescence fold increase compared to negative control ( mus alone, n= , grey). (d) binding of hsulf- wt (n= , black) and hsulf- Δsg (n= , white) to the surface of human amnion- derived wish cells was monitored by facs using the h . anti-hsulf- antibody. ordinary one-way anova with type of hsulf- as a factor revealed significant effects on a sulfatase activity (f , = . , p < . ) (c), and a cell-surface binding (f , = . , p < . ). (d) post-hoc tukey’s range test showed significant difference in the mus activity and binding to human wish cells in hsulf- Δsg compared with control (n= , grey) or hsulf- wt. error bars indicate sd. (e) western blot analysis of hsulf- wt (lane ) and hyaluronidase treated hsulf- wt (lane ), using the anti hd h antibody. (f) [ua( s)-glcns( s)] trisulfated disaccharide (ns s s, black) and ua( s)-glcns] disulfated disaccharid (ns s, white) content (as in (a), expressed as a percentage of total disaccharide content, n= ) of heparin, without (hp) or after digestion with hsulf- wt or hyaluronidase (hyal)- treated hsulf- wt ( h at °c). data show significantly increased heparin -o-desulfation for hyal-treated hsulf- wt. error bars indicate sd (****p< . ). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. : effects of hsulf- wt and hsulf- Δsg during tumor progression and metastasis. (a) time course measurement of tumor size induced by mda-mb- cells expressing dsred, hsulf- wt or hsulf- Δsg. statistical analysis was performed using a two-way anova test, ***p≤ . and **p< . . pictures representative of each tumor group, at day , (a, right panel). (b) histological analysis of necrotic area using eosin/hematoxin staining of tumors expressing mock dsred, hsulf- wt and hsulf- Δsg. the percentage of necrotic area was determined on three sections from each of the .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / six mice in each group (one-way anova test, multiple comparison (tukey’s test, n= ), ***p≤ . and **p< . ). (c) histological analysis of the percentage of pulmonary metastatic area from dsred, hsulf- and hsulf- Δsg expressing tumors. the measurement was performed on three sections from each of the six mice in each group (one-way anova test, multiple comparison, tukey’s test, n= **p< . and ***p< . ). (d) the size of pulmonary metastasis in each group was quantified and analyzed as in c (n= **p< . and ***p< . ). (e) representative images of hematoxylin/eosin stained sections of indicated lung. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / supplementary material fig. s : schematic representation of hsulf- molecular organization, pmts and antibody epitopes. hsulf- amino-acid (a.a.) pro-protein comprises a signal peptide (sp, black box) and a polypeptide processed through furin cleavage (black arrows) into two n-terminal (n-term) and c-terminal (c-term) subunits. hsulf- comprises two major functional domains: a catalytic domain (cat, in grey) and a highly basic hydrophilic region (hd, hatched in grey), and features a c-terminal region sharing homology with glucosamine- -sulfatase homolog (c, dotted). potential n-glycosylation sites (n), the catalytic fgly residue (fgly, in red and bold) and the sg dipeptides (blue, in bold for s g) and antibody epitopes (black bars) are indicated. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : study of hsulf- and hsulf- hd by small angle x-ray scattering sax analysis of hsulf- (panels a-e) and hsulf- hd (panels f-k). (a) scattering curves of experimental data of hsulf- in solution. (b) linear dependence of ln[i(q)] vs q determined by guinier plot at . mg/ml with a rg of nm and i of with a porod volume more than . hsulf- give a mwexp: kda. mwmalls: kda , mwth protein: , kda. this data indicate the presence of elongated molecule with a potential rode shape. (c) pair distribution function p(r) in arbitrary units (arb.u) vs. r (nm) determined by gnom with a dmax of nm +/- nm indicate that the hsulf- is an elongated molecule in solution. (d) i globularity and flexibility analysis of hsulf- . kratky plot(i(q)*q vs. q) of hsulf- not converge to the q axis witch and indicate the presence of mixture of multidomain protein with flexible linker and unfolded region (could be allocate to the gag). (e) final ab initio model of hsulf- generated with individual dammif model in slow mode. damaver classification under nsd value indicates the presence of several clusters (nsd > . ) for hsulf- suggesting the presence of flexible regions. the proposed final model of hsulf- combines of the models calculated with the best nsd (between . to . ). (f) scattering curves of experimental data of hsulf- hd domain in solution. (g) linear dependence of ln[i(q)] vs q determined by guinier plot at several concentrations between , to mg/ml give a linear region with rg of . nm and a i of with a porod volume of . mwexp: kda (mwmalls: kda, mwth protein: kda). this data indicate the presence of potential globular protein. (h) pair distribution function p(r) in arbitrary units (arb.u) vs. r (nm) determined by gnom give a dmax of nm with a relative globular shape. (i) kratky plot of hsulf- hd present a .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / "bell-shape" peak at low q and converges to the q axis at high q corresponding to a well-folded globular protein. (j) final ab initio model of hsulf- hd generated with individual dammif model in slow mode and merged with damaver (nsd < . ). the hsulf- hd ab initio model give a globular envelope with a small-elongated part. (k) superimposition of prediction structure of hsulf- hd based on pdb: upl (from silicibacter pomeroyi) structure (phyre analysis) into hsulf- hd saxs envelope with supcomb program. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : size-exclusion chromatography of hsulf- wt, hsulf- variants and hsulf- size exclusion chromatography profile of hsulf- wt without (a), or following pre-treatment with heparinase i, ii, iii (b), m nacl (c), or xyloside (d). size exclusion chromatography profile of hsulf- Δs g (e), Δs g (f), or hsulf- (g). grey bars show sulf-containing peaks. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : size-exclusion chromatography of tev-treated hsulf- wt and variant. size exclusion chromatography profile of tev-treated hsulf- wt (a), chondroitinase abc-treated hsulf- wt (b), or hsulf- Δsg (c). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : maldi-tof mass spectrometry analysis of hsulf- Δsg. mass spectrum of hsulf- Δsg in positive ionization mode ( kda-filtrated hsulf- Δsg mixed with sinapinic acid matrix, linear mode). hsulf- Δsg is detected as the protonated species [m+h]+ and corresponding doubly and tri-charged [m+ h] + and [m+ h] + ions. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : lc-ms/ms detection of a cs/ds gag linkage region attached to s of hsulf- hsulf- glycopeptides were obtained by trypsin digestion of media of cultured sh-sy y cells, followed by enrichment on a sax column, and thereafter treatment with chondroitinase abc. the spectral files were filtered for the ms diagnostic ion at m/z . corresponding to the delta-hexuronic acid - .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / n-acetylgalactosamine disaccharide, common to all cs/ds linkage region glycopeptides. (a) ms spectrum of the -dggdfsgtgglpdysaanpik- glycopeptide obtained by hcd with normalized collision energy of %, providing prominent glycan fragmentations. (b) ms spectrum of the same glycopeptide obtained at normalized collision energy of %, displaying peptide sequence fragmentation with b- and y-ions annotated in the sequence. the positioning and distinction of sulfate ( . u) and phosphate ( . u) modifications were done by manually interpreting the ms spectra. the ms spectrum thus displayed a mass shift of . u between m/z . and m/z . , demonstrating the presence of a sulfate modification on the galnac residue. a mass shift of . u was observed between m/z . ( +) and m/z . ( +), demonstrating the presence of a xylose plus phosphate modification of the peptide (the theoretical mass of this modification is . u ( . u + . u). .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : expression and activity of hsulf- in mda-mb- transduced cells (a) western blot analysis of dsred (lanes ), hsulf- wt (lanes ) or hsulf- Δsg (lanes ) transduced mda-mb- cell lysates, using anti c-terminal hsulf- b antibody and anti-actin antibody (sigma, ref a- ) as a loading control. (b) endosulfatase activity was monitored by treating heparin with dsred, hsulf- wt or hsulf- Δsg transduced mda-mb- cell conditioned medium. results are .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / expressed as fold increase of ns s disaccharide content compared to untreated heparin (control). (c) western blot analysis (h antibody) of hsulf- wt transduced mda-mb- cell conditioned medium prior to (wt, lane ) or after (wt / csase, lane ) treatment with chondroitinase abc. (d) western blot analysis ( b antibody) of mice tumor lysates resulting from injections of dsred (lanes and ), hsulf- wt (lanes and ) or hsulf- Δsg (lanes and ) transduced mda-mb- cells. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / fig. s : effects of hsulf- wt and hsulf- Δsg on tumor metastasis. (a) histological analysis of the vascularized area, using α smooth muscle actin (αsma) immunostaining of tumors expressing mock dsred, hsulf- wt and hsulf Δsg. the calculation of vascularized area of tumors was performed on five mice in each group. for each mouse, rois (region of interest) were quantified: the αsma positive area was measured for each roi and divided by the total roi’s area .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / giving for each mice a percentage of vascularized area. in each group, the median of this percentage was divided by the mean of the median of the dsred control mice (one-way anova, multiple comparison, bonferroni test,*p= . , **p= . ). (b) representative images of αsma staining with different magnifications. (c) percentage of mice with metastasis found in the lung, left axillary and right axillary lymph node (ln) from dsred, hsulf- and hsulf- Δsg expressing tumors. .cc-by-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / o'keefe et al. joces resubmission ipomoeassin-f inhibits the in vitro biogenesis of the sars- cov- spike protein and its host cell membrane receptor sarah o’keefe , , peristera roboti , kwabena b. duah , guanghui zong , hayden schneider , wei q. shi and stephen high , school of biological sciences, faculty of biology, medicine and health, university of manchester, manchester, m pt, united kingdom department of chemistry, ball state university, muncie, indiana , usa department of chemistry and biochemistry, university of maryland, college park, maryland , usa lead contacts for correspondence: sarah.okeefe@manchester.ac.uk; stephen.high@manchester.ac.uk running title ipom-f as a potential antiviral agent keywords cell-free translation, endoplasmic reticulum (er), er membrane complex (emc), sec translocon, viral protein biogenesis. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of abstract in order to produce proteins essential for their propagation, many pathogenic human viruses, including sars-cov- the causative agent of covid- respiratory disease, commandeer host biosynthetic machineries and mechanisms. three major structural proteins, the spike, envelope and membrane proteins, are amongst several sars-cov- components synthesised at the endoplasmic reticulum (er) of infected human cells prior to the assembly of new viral particles. hence, the inhibition of membrane protein synthesis at the er is an attractive strategy for reducing the pathogenicity of sars-cov- and other obligate viral pathogens. using an in vitro system, we demonstrate that the small molecule inhibitor ipomoeassin f (ipom-f) potently blocks the sec -mediated er membrane translocation/insertion of three therapeutic protein targets for sars-cov- infection; the viral spike and orf proteins together with angiotensin-converting enzyme , the host cell plasma membrane receptor. our findings highlight the potential for using er protein translocation inhibitors such as ipom-f as host-targeting, broad-spectrum, antiviral agents. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of introduction many viruses, including sars-cov- (zhou et al., ; zhu et al., ) (fig. a), hijack the host cell secretory pathway to correctly synthesise, fold and assemble important viral proteins (bojkova et al., ; gordon et al., ; sicari et al., ). hence, small molecule inhibitors of sec -mediated co-translational protein entry into the endoplasmic reticulum (er) (luesch and paavilainen, ) have potential as broad-spectrum antivirals (heaton et al., ; shah et al., ). such inhibitors offer a dual approach; first, by directly inhibiting production of key viral proteins and, second, by reducing levels of host proteins co-opted during viral infection. hence, human angiotensin-converting enzyme (ace ) is an important host cell receptor for sars-cov- viral entry (cantuti-castelvetri et al., ; daly et al., ; walls et al, ) synthesised at the er prior to its trafficking to the plasma membrane (warner et al., ). our recent studies show that ipomoeassin-f (ipom-f) (fig. b) is a potent and selective inhibitor of sec -mediated protein translocation at the er membrane (zong et al., ; o’keefe et al., submitted). given that sars-cov- membrane proteins likely co-opt host mechanisms of er entry (cf. gordon et al., ; sicari et al., ), we concluded that their sensitivity to ipom-f would likely be comparable to that of endogenous sec clients (fig. c; see also zong et al., ; o’keefe et al., submitted). we, therefore, evaluated the effects of ipom-f on sars-cov- proteins containing hydrophobic er targeting signals (fig. d). the in vitro membrane insertion of the viral spike (s) protein and membrane translocation of the orf protein are both strongly inhibited by ipom- f, whilst several other viral membrane proteins are unaffected (fig. ). likewise, the er integration of ace , an important host receptor for sars-cov- (walls et al., ), is highly sensitive to ipom-f (fig. ). we show that the principle molecular basis for the ipom-f sensitivity of sars- cov- proteins is their dependence on sec , as dictated by their individual structural features and membrane topologies (fig. ). taken together, our in vitro study of sars-cov- protein synthesis at the er highlights ipom-f as a promising candidate for the development of a broad-spectrum, host-targeting, antiviral agent. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of results and discussion ipom-f selectively inhibits er translocation of the viral orf and s proteins to explore the ability of ipom-f to inhibit the er translocation of a small, yet structurally diverse, panel of sars-cov- membrane and secretory-like proteins, we first used a well-established in vitro translation system supplemented with canine pancreatic microsomes (fig. a). to facilitate the detection of er translocation, we modified the viral orf , s, e, m and orf proteins by adding an opg -tag; an epitope that supports efficient er lumenal n-glycosylation and enables product recovery via immunoprecipitation, without affecting ipom-f sensitivity (fig. s a) (o’keefe et al., submitted). for viral proteins that lack endogenous sites for n-glycosylation, such as the e protein, the er lumenal opg -tag acts as a reporter for er translocation and enables their recovery of by immunoprecipitation. where viral proteins already contain suitable sites for n- glycosylation (s and m proteins), the cytosolic opg -tag is used solely for immunoprecipitation. the identity of the resulting n-glycosylated species for each of these opg -tagged viral proteins was confirmed by endoglycosidase h (endo h) treatment of the radiolabelled products associated with the membrane fraction prior to sds-page (fig. b, cf. lanes and in each panel). using er lumenal modification of either endogenous n-glycosylation sites (viral s and m proteins) or the appended opg -tag (viral e and orf proteins) as a reporter for er membrane translocation, we found that µm ipom-f strongly inhibited both the translocation of the soluble, secretory-like protein orf -opg and the integration of the type i transmembrane proteins (tmp) s-opg , and truncated derivatives thereof (fig. b, fig. c, fig. s c). furthermore, membrane insertion of the human type i tmp, ace , was inhibited to a similar extent (fig. b, fig. c, ~ to ~ % inhibition for these three proteins). these results mirror previous findings showing that precursor proteins bearing n- terminal signal peptides, and which are therefore obligate clients for the sec - translocon, are typically very sensitive to ipom-f-mediated inhibition (zong et al., ; o’keefe et al., submitted). in the context of sars-cov- infection, wherein ace acts as an important host cell receptor for the sars-cov- virus .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of via its interaction with the viral s protein (walls et al., ), these data suggest that an ipom-f-induced antiviral effect might be achieved via selective reductions in the biogenesis of both host and viral proteins (cf. fig. a). in contrast to the viral s and orf proteins, insertion of the viral e protein was unaffected by ipom-f (fig. b-c), consistent with its recent classification as a type iii tmp (duart et al., ). type iii tmp integration is highly resistant to ipom-f (zong et al., ), most likely because they exploit a novel pathway for er insertion (cf. fig. ; o’keefe et al., submitted). we therefore conclude that the known substrate-selective inhibitory action of ipom-f at the sec translocon is directly applicable to viral membrane proteins; whereby the er translocation of secretory proteins and type i tmps, but not type iii tmps, is efficiently blocked by ipom-f. the viral m protein is a multi-pass tmp with its first tmd oriented so the n- terminus is exoplasmic (nexo) and hence can be considered “type iii-like”. although human multi-pass tmps of this type typically require both the er membrane complex (emc) and sec translocon for their authentic er insertion (chitwood et al., ), ipom-f had no significant effect on the er translocation/insertion of the m protein in vitro, as judged by the efficiency of n- glycosylation of its n-terminal domain (fig. c). we conclude that the integration of its first tmd is unaffected by ipom-f, consistent with its use of the emc (chitwood et al., ; o’keefe et al., submitted). there is however a qualitative reduction in the intensity of both the non- and n-glycosylated forms of the m protein when compared to the control (see fig. b and fig. s a). we speculate that this decrease may reflect an ipom-f-induced effect on the sec - dependent integration of the second and/or third tm-spans of the m protein (cf. chitwood et al., ) and our future studies will aim to resolve this question. nevertheless, like similar host cell multi-pass tmps that are resistant to a similar sec inhibitor mycolactone (morel et al., ), the m protein appears more resistant to ipom-f than either the s or orf proteins (fig , fig. s a). in practice, the potential resistance of this highly abundant and functionally diverse class of endogenous multi-spanning membrane proteins (von heijne, ) may limit any ipom-f-induced cytotoxicity towards host cells. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of orf assumes a lumenal-facing hairpin topology in er-derived microsomes cell-based studies of the orf protein from sars-cov- suggest it has an unusual hairpin topology with both its n- and c-termini located on the exoplasmic side of the host cell membrane, to which it binds via an n-terminal amphipathic helix (netland et al., ). to independently determine the membrane topology of sars-cov- orf , we prepared versions with opg tags at both its n- and c-termini, or single tagged equivalents (see fig. b, schematics, opg -orf - opg , opg -orf and orf -opg ). following membrane insertion, doubly tagged opg -orf -opg shows significant amounts of species with - and - n-linked glycans (fig. b). this pattern confirms that the sars-cov- orf protein assumes a ‘hairpin’ conformation in the er membrane with both its n- and c-termini in the lumen (fig. b, opg -orf -opg ). these - and -n- glycan bearing opg -orf -opg species are also resistant to extraction with alkaline sodium carbonate buffer (fig. s d) and protected from added protease (fig. s e), further indicating that the majority of the orf protein is stably associated with the er membrane in a ‘hairpin’ (nexo/cexo) topology. consistent with this unusual membrane topology, we find no indication that the membrane insertion of any of our opg -tagged orf variants is reduced by ipom-f, strongly suggesting that its association with the inner leaflet of the er membrane does not require protein translocation via the central channel of the sec translocon (gérard et al., ; o’keefe et al., submitted). we noted a sub-set of opg -orf -opg species bearing only a single n-glycan was also clearly present in the membrane-associated fraction with or without ipom-f treatment (fig. b, opg -orf -opg , see gly). based on comparison to singly opg -tagged variants (fig. b), we conclude that opg -orf -opg - gly has its n-terminus in the er lumen, where only one of its two consensus sites is efficiently n-glycosylated (cf. nilsson and von heijne, ), whilst its c- terminus is either er luminal but non-glycosylated or remains on the cytosolic side of the membrane. in the latter case, it may be that, in addition to its hairpin topology, some fraction of orf may be integrated into er-derived microsomes as a type iii tmp (cf. fig. s e; see also netland et al., ) that is resistant to ipom-f inhibition (this study; zong et al., ; o’keefe et al., submitted). .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of the molecular basis for sars-cov- protein sensitivity to ipom-f having ascertained that ipom-f inhibits the er membrane translocation/insertion of the viral orf and s proteins, but not of the orf , e or m proteins (fig. c), we next investigated the molecular basis for this selectivity. for these studies we employed semi-permeabilised (sp) mammalian cells, depleted of specific membrane components via sirna-mediated knockdown, as our source of er membrane (fig. a; wilson et al., ). consistent with our recent work (o’keefe et al., submitted), and based on the quantitative immunoblotting of target and control gene products (fig. s a-c), we selectively depleted hela cells for core components of the sec translocon (sec α-kd, ~ % reduction), the emc (emc -kd, ~ % reduction) and both together (sec α+emc -kd, ~ % and ~ % reduction) prior to semi-permeabilisation with digitonin and use for in vitro er translocation assays. following the analysis of total opg -tagged translation products recovered by immunoprecipitation, we found that: i) the s protein and a truncated derivative were both more strongly affected by the depletion of sec α than of emc (fig. b, fig. s d); ii) the orf protein was likewise strongly affected by sec α depletion but also sensitive to emc depletion (fig. c); iii) the e protein showed diminished insertion efficiency after knock-down of sec α and emc , although the latter had a more pronounced effect (fig. d). in each case, the combined knockdown of sec α and emc resulted in a reduction of membrane insertion that was either comparable to, or greater than, that achieved following the knock- down of sec α alone (figs. b to d). for the orf protein, the total level of n-glycosylated opg -orf -opg species was unaffected by any knockdown condition tested (fig. e). however, we note a marked increase in the proportion of potentially mis-inserted opg -orf -opg - gly species, particularly after co-depletion of emc and sec α (see fig. e; fig. s e). we speculate that the unusual hairpin topology of the orf protein may be attributed to the emc and sec complex acting in concert to provide an ipom-f insensitive pathway for protein translocation across the er membrane (o’keefe et al., , submitted). perturbation of this pathway seemingly increases the potential for orf to mis- .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of insert (cf. chitwood et al., ), perhaps as a consequence of disruption to the translocation of its c-terminus (fig. s e). taken together, our data establish that, analogous to human membrane and secretory proteins, the principal molecular basis for the ipom-f-sensitivity of the sars-cov- orf and s proteins is their dependence on sec -mediated protein translocation into and across the er membrane. in contrast, the e, m, and orf proteins appear capable of exploiting one or more alternative membrane insertion/translocation pathways that can bypass the translocase activity of the sec complex. these alternatives most likely include a recently described route for type iii tmp insertion that requires the insertase function of the emc (o’keefe et al., submitted), which our data suggest is also sufficient to confer ipom-f-resistance to the viral e protein and at least the first tm-span of the viral m protein. concluding remarks we conclude, that sec -selective protein translocation inhibitors like ipom-f hold promise as broad-spectrum antivirals that may exert a therapeutic effect by selectively inhibiting the er translocation of viral and/or host proteins which are crucial to viral infection and propagation (mast et al., ). in the context of sars-cov- , integration of the viral s protein and its host cell receptor, ace , into the er membrane is significantly reduced by ipom-f (fig. c, b). likewise, translocation of the viral orf protein across the er membrane and into its lumen is substantially diminished (fig. c, c). the binding of the viral s protein to cell surface ace is a key step in host cell infection (drew and janes, ), whilst orf may protect sars-cov- infected cells against host cytotoxic t lymphocytes (zhang et al., ), making all three of these proteins viable therapeutic targets (drew and janes, ; li et al., ; young et al., ). like other small molecule inhibitors that target fundamental cellular pathways (bojkova et al. ), the broad-ranging effects of sec inhibitors on host cell membrane and secretory protein synthesis (morel et al., ; zong et al. ), including the strong in vitro effect of ipom-f on ace biogenesis (cf. grob et al. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of ), present an obvious hurdle to their future use. nevertheless, given that ipom-f is a potent inhibitor of sec -mediated protein translocation in cell culture models (zong et al., ), and appears well tolerated in mice (zong et al., ), we propose that future studies investigating its effect on sars-cov- infection and propagation in cellular models are clearly warranted (cf. bojkova et al. ). .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of materials and methods ipom-f and antibodies ipom-f was synthesised as previously described (zong et al., in press). antibodies used to validate sec and/or emc subunit depletions in sp cells (fig. s ) were purchased from santa cruz biotechnology (goat polyclonal anti- lmnb (clone m- , sc- ), bethyl laboratories (rabbit polyclonal anti-emc (a - -a)), abcam (rabbit polyclonal anti-emc , (ab )), gifted by sven lang and richard zimmermann (university of saarland, homburg, germany, rabbit anti-sec α) or as previously described (mouse monoclonal anti-opg tag (mckenna et al., ) and rabbit polyclonal anti-ost (wilson et al., ). dna constructs the cdna for human ace (uniprot: q byf ) was purchased from sino biological (hg -m). cdnas encoding the sars-cov- genes for orf , orf and the e m and s proteins (uniprot: p dtc , p dtc , p dtc , p dtc , p dtc respectively) were kindly provided by nevan krogan (ucsf, us) (gordon et al. ), amplified by pcr, subcloned into the pcdna vector and constructs validated by dna sequencing (gatc, eurofins genomics). orf -opg , orf -opg , m-opg and s-opg were generated by inserting the respective cdnas in frame between nhei and aflii sites of a pcdna /frt/v - his vector (invitrogen) containing a c-terminal opg tag (mngtegpnfyvpfsnktg). opg -e was generated by cloning the cdna encoding the e-protein into the same pcdna -opg vector using the kpni and bamhi sites and deleting the stop codon after the opg tag by site-directed mutagenesis (stratagene quikchange, agilent technologies). the n-terminal opg -tag of opg -orf -opg was inserted by site-directed mutagenesis of orf -opg using the relevant forward and reverse primers (integrated dna technologies). linear dna templates were generated by pcr and mrna transcribed using t polymerase. sirna-mediated knockdown and sp cell preparation hela cells (human epithelial cervix carcinoma cells) were cultured in dmem supplemented with % (v/v) fbs and maintained in a % co humidified incubator at °c. knockdown of target genes were performed as previously .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of described (o’keefe et al., submitted) using nm (final concentration) of either control sirna (on-targetplus non-targeting control pool; dharmacon), sec a sirna (sec α-kd, ge healthcare, sequence aacacugaaaugucuacguuuuu), mmgt sirna (emc -kd, thermofisher scientific, s ) and interferin (polyplus, - ) as described by the manufacturer. h post-initial transfection, cells were semi-permeabilsed using μg/ml high purity digitonin (calbiochem) and treated with . u nuclease s micrococcal nuclease from staphylococcus aureus (sigma-aldrich, ) as previously described (o’keefe et al., submitted; wilson et al., ). sp cells lacking endogenous mrna were resuspended ( x sp cells/ml as determined by trypan blue (sigma-aldrich, t ) staining) in khm buffer ( mm koac, mm mg(oac) , mm hepes-koh ph . ) prior to analysis by western blot, or inclusion in translation master mixes such that each translation reaction contained x cells/ml. in vitro er import assays standard translation and membrane translocation/insertion assays, supplemented with nuclease-treated canine pancreatic microsomes (from stock with od = /ml) or sirna-treated sp hela cells, were performed in nuclease-treated rabbit reticulocyte lysate (promega) as previously described (zong et al., ; o’keefe et al., submitted): namely in the presence of easytag express s protein labelling mix containing [ s] methionine (perkin elmer) ( . mbq; . tbq/mmol), μm amino acids minus methionine (promega), µm ipom-f, or an equivalent volume of dmso, . % (v/v) er- derived microsomes or sp cells and ~ % (v/v) of in vitro transcribed mrna (~ ng/μl) encoding the relevant precursor protein. microsomal translation reactions ( μl) were performed for min at °c whereas those using sp hela cells were performed on a . x scale ( μl translation reactions) for h at °c. as the s protein was most efficiently synthesised using the tnt® coupled system (fig. s b), import assays of the comparatively higher molecular weight ace and s proteins ( μl reactions) were both performed using the tnt® coupled transcription/ translation system (promega) for min at °c as described by the manufacturer (~ ng/μl cdna, µm ipom-f or an equivalent .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of volume of dmso, % (v/v) er-derived microsomes or sp cells). all translation reactions were finished by incubating with . mm puromycin for min at °c to ensure translation termination and ribosome release of newly synthesised proteins prior to analysis. recovery and analysis of radiolabelled products following puromycin treatment, microsomal membrane-associated fractions were recovered by centrifugation through an μl high-salt cushion ( . m sucrose, . m koac, mm mg(oac) , mm hepes-koh, ph . ) at , g for min at °c and the pellet suspended directly in sds sample buffer. to confirm the topology of orf (fig. s ), the membrane-associated fraction of the doubly-opg -tagged form (opg -orf -opg ) was resuspended in khm buffer ( μl) and subjected to either carbonate extraction ( . m na co , ph . ) (mckenna et al., ) or a protease protection assay using trypsin ( μg/ml) with or without . % triton x- (ray-sinha et al., ) prior to suspension in sds sample buffer. for translation reactions using sp cells, the total reaction material was diluted with nine volumes of triton immunoprecipitation buffer ( mm tris-hcl, mm nacl, mm edta, % triton x- , mm pmsf, mm methionine (to prevent background from the radiolabelled methionine), ph . ). samples were incubated under constant agitation with an antibody recognising the opg epitope ( : dilution) for h at °c to recover both the membrane-associated and non-targeted nascent chains. samples were next incubated under constant agitation with % (v/v) protein-a-sepharose beads (genscript) for a further h at °c before recovery by centrifugation at , g for min. protein-a-sepharose beads were washed twice with triton immunoprecipitation buffer prior to suspension in sds sample buffer. where indicated, samples were treated with u of a form of endoglycosidase h that does not co-migrate with and hence potentially distort the radiolabelled products when resolved: endoglycosidase hf (translation products of ~ - kda; new england biolabs, p s) or endoglycosidase h (translation products of ~ - kda protein substrates; new england biolabs, p s). all samples were solubilised for h at °c and then sonicated prior to resolution by sds-page ( % or % page, v, - min). gels were .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of fixed for min ( % meoh, % acoh), dried for h at °c and radiolabelled products visualised using a typhoon fla- (ge healthcare) following exposure to a phosphorimaging plate for - h. western blotting following semi-permeabilisation, aliquots of sirna-treated hela cells were suspended in sds sample buffer, denatured for h at °c and sonicated prior to resolution by sds-page ( % or % page, v, - min). following transfer to a pvdf membrane in transfer buffer ( . m tris, . m glycine, % meoh) at ma for . h, pvdf membranes were incubated in x casein blocking buffer ( x stock from sigma-aldrich, b ) made up in tbs, incubated with appropriate primary antibodies ( : or : dilution) and processed for fluorescence-based detection as described by li-cor biosciences using appropriate secondary antibodies (irdye rd donkey anti-goat, irdye rd donkey anti-rabbit, irdye cw donkey anti-mouse) at : , dilution. signals were visualised using an odyssey clx imaging system (li-cor biosciences). quantitation and statistical analysis bar graphs depict either the efficiency of membrane translocation/insertion calculated as the ratio of n-glycosylated protein relative to the amount of non-n- glycosylated protein (fig. - ), or the efficiencies of sirna-mediated knockdown in sp cells calculated as a proportion of the protein content when compared to the nt control (fig. s ), with all control samples set to %. normalised values were used for statistical comparison (one-way or two-way anova; df and f values are shown in each figure as appropriate and the multiple comparisons test used are indicated in the appropriate figure legend). statistical significance is given as n.s., non-significant > . ; *, p < . ; **, p < . ; ***, p < . ; ****, p < . . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of acknowledgements we thank quentin roebuck for technical assistance, nevan krogan (ucsf) for sars-cov- plasmids, sven lang (university of saarland) for sec α antisera, belinda hall and rachel simmonds (university of surrey) for useful discussions. we are indebted to richard zimmermann (university of saarland) for catalyzing sars-cov- related discussions amongst the er research community. competing interests the authors declare no competing interests. author contributions k.b.d., g.z. and h.s. participated in synthesis of ipom-f and w.q.s supervised the synthesis; p.r. generated sars-cov- plasmids; s.o’k. performed site- directed mutagenesis and experiments; s.o’k. and s.h. designed the study, analysed the data and wrote the manuscript. funding this work was supported by a wellcome trust investigator award in science /z/ /z (s.h.), an area grant r gm - a from the national institute of general medical sciences of the national institutes of health (nih) and a ball state university (bsu) provost startup award (w.q.s.). supplementary information supplementary information fig. s and fig. s accompanies this report. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of references adalja, a., and inglesby, t. ( ). broad-spectrum antiviral agents: a crucial pandemic tool. exp. rev. anti-infect. ther. , - . bojkova, d., klann, k., koch, b., widera, m., krause, d., ciesek, s., cinatl, j., and münch, c. ( ). proteomics of sars-cov- infected host cells reveals therapy targets. nature. , - . cantuti-castelvetri, l., ojha, r., pedro, l. d., djannatian, m., franz, j., kuivanen, s., van der meer, f., kallio, k., kaya, t., anastasina, m., et al. ( ). neuropilin- facilitates sars-cov- cell entry and infectivity. science. , - . chitwood, p. j., juszkiewicz, s., guna, a., shao, s., and hegde, r. s. ( ). emc is required to initiate accurate membrane protein topogenesis. cell. , - . daly, j. l., simonetti, b., klein, k., chen, k.-e., kavanagh williamson, m., antón-plágaro, c., shoemark, d. k., simón-gracia, l., bauer, m. et al. ( ). neuropilin- is a host factor for sars-cov- infection. science. , - . drew, e. d., and janes, r. w. ( ). identification of a druggable binding pocket in the spike protein reveals a key site for existing drugs potentially capable of combating covid- infectivity. bmc mol. cell biol. , . duart, g., garcía-murria, m. j., grau, b., acosta-cáceres, j. m., martínez- gil, l., and mingarro i. ( ) sars-cov- envelope protein topology in eukaryotic membranes. open biol. , . firth, a. e. ( ). a putative new sars-cov protein, c, encoded in an orf overlapping orf a. j. gen. virol. , - . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of gérard, s. f., hall, b. s., zaki, a. m., corfield, k. a., mayerhofer, p. u., costa, c., whelligan, d. k., biggin p. c., simmonds, r. e., and higgins, m. k. ( ). structure of the inhibited state of the sec translocon. mol. cell. , - .e . gordon, d. e., jang, g. m., bouhaddou, m., xu, j., obernier, k., white, k. m., o’meara, m. j., rezelj, v. v., guo, j. z., swaney, d. l., tummino, t. a. et al. ( ) a sars-cov- protein interaction map reveals targets for drug repurposing. nature. , - . grob, s., jahn, c., cushman, s., bär, c., and thum, t. ( ). sars-cov- receptor ace -dependent implications on the cardiovascular system: from basic science to clinical implications. j. mol. cell. cardiol. , - . heaton, n. s., moshkina, n., fenouil, r., gardner, t. j., aguirre, s., shah, p. s., zhao, n., manganaro, l., hultquist, j. f., noel, j. et al. ( ). targeting viral proteostasis limits influenza virus, hiv, and dengue virus infection. immunity. , - . li, j.-y., liao, c.-h., wang, q., tan, y.-., luo, r., qiu, y., and ge, x.-y. ( ). the orf , orf and nucleocapsid proteins of sars-cov- inhibit type i interferon signalling pathway. virus res. , . luesch, h., and paavilainen, v. o. ( ). natural products as modulators of eukaryotic protein secretion. nat. prod. rep. , - . mast, f. d., navare, a. t., van der sloot, a. m., coulombe-huntington, j. rout, m. p., baliga, n. s., kaushansky, a., chait, b. t., aderem, a., rice, c. m. et al. ( ). crippling life support for sars-cov- and other viruses through synthetic lethality. j. cell. biol. , e . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of mckenna, m., simmonds, r.e., and high, s. ( ). mechanistic insights into the inhibition of sec -dependent co- and post-translational translocation by mycolactone. j. cell sci. , - . morel, j. d., paatero, a. o., wei, j., yewdell, j. w., guenin-macé, l., van haver, d., impens, f., pietrosemoli, n., paavilainen, v. o., and demangel, c. ( ). proteomics reveals scope of mycolactone-mediated sec blockade and distinctive stress signature. mol. cell prot. , - . naqvi, a. a. t., fatima, k., mohammad, t., fatima, u., singh, i. k., singh, a., atif, a. m., hariprasad, g., hasan, g. m., and hassan, m. i. ( ) insights into sars-cov- genome, structure, evolution, pathogenesis and therapies: structural genomics approach. biochim. biophys. acta. mol. basis dis. , . netland, j., ferraro, d., pewe, l., olivares, h., gallagher, t., and perlman, s. ( ). enhancement of murine coronavirus replication by severe acute respiratory syndrome coronavirus protein requires the n-terminal hydrophobic region but not c-terminal sorting motifs. j. virol. , - . nilsson, i. m., and von heijne, g. ( ). determination of the distance between the oligosaccharyltransferase active site and the endoplasmic reticulum membrane. j. biol. chem. , - . ray-sinha, a., cross, b. c. s., mironov, a., wiertz, e., and high, s. ( ). endoplasmic reticulum-associated degradation of a degron-containing polytopic membrane protein. mol. membr. biol. , - . o’keefe, s., zong, g., duah, k. b., andrews, l. e., shi, w. q., and high, s. ( ). type iii transmembrane protein integration requires both the emc and sec complex. submitted. .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of sicari, d., chatziioannou, a., koutsandreas, t., sitia, r., and chevet, e. ( ). role of the early secretory pathway in sars-cov- infection. j. cell. biol. , e . shah, p. s., link, n., jang, g. m., sharp, p. p., zhu, t., swaney, d. l., johnson, j. r., von dollen, j., ramage, h. r., satkamp, l. et al. ( ). comparative flavivirus-host protein interaction mapping reveals mechanisms of dengue and zika virus pathogenesis. cell. , - .e . von heijne, g. ( ). the membrane protein universe: what’s out there and why bother? j. intern. med. , - . walls, a. c., park, y.-j., tortorici, m. a., wall, a., mcguire, a. t., and veesler, d. ( ). structure, function and antigenicity of the sars-cov- spike glycoprotein. cell. , p - .e . warner, f. j., lew, r. a., smith, a. i., lambert, d. w., hooper, n. m., and turner, a. t. ( ). angiotensin-converting enzyme (ace ), but not ace, is preferentially localised to the apical surface of polarised kidney cells. j. biol. chem. , - . wilson, c. m., and high, s. ( ). ribophorin i acts as a substrate-specific facilitator of n-glycosylation. j. cell. sci. , - . young, b. e., fong, s.-w., chan, y. h., mak, t.-m., ang, l. w., anderson, d. e., yi-pin lee, c., naqiah amrun, s., lee, b., shan goh, y. et al. ( ). effects of a major deletion in the sars-cov- genome on the severity of infection and inflammatory response: an observational cohort study. lancet. , - . zhang, y., zhang, j., chen. y., luo, b., yuan, y., huang, f., yang, t., yu, f., liu, j., song, z. et al. ( ). the orf protein of sars-cov- mediates immune evasion through potently downregulating mhc- . biorxiv. doi: . / . . . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of zhou, p., yang, x.-l., wang, x.-g., hu, b., zhang, l., zhang, w., si, h.-r., zhu, y., li, b., huang, c.-l. et al. ( ). a pneumonia outbreak associated with a new coronavirus of probable bat origin. nature. , - . zhu, n., zhang, d., wang, w., li, x., yang, b., song, j., zhao, x., huang, b., shi, w., lu, r., et al. ( ). a novel coronavirus from patients with pneumonia in china, . n. eng. j. med. , - . zong, g., hu, z., o’keefe, s., tranter, d., iannotti, m. j., baron, l., hall, b., corfield, k., paatero, a., henderson m. et al. ( ). ipomoeassin f binds sec α to inhibit protein translocation. j. am. chem. soc. , - . zong, g., hu, z., duah, k., b., andrews, l. e., zhou, j., o’keefe, s., whisenhunt, l., shim, j. s., du, y., high, s., et al. ( ) ring-expansion leads to a more potent analogue of ipomoeassin f. j. org. chem. doi: . /acs.joc. c .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of figure legends fig. . ipom-f as a potential inhibitor of sars-cov- viral protein synthesis. (a) schematic of (+) ssrna genome architecture of sars-cov- ( nt) containing ’ capped mrna with a leader sequence (ls), ’ end poly-a tail, ’ and ’ utrs and open reading frames (orfs): orf a, orf b, spike (s), orf a, envelope (e), membrane (m), orf , orf , orf , nucleoprotein (n) and orf (firth, ; naqvi et al., ). an important mode of sars-cov- host entry proceeds via interaction of the viral s protein with human angiotensin- converting enzyme (ace ) (walls et al., ). (b) structure of ipomoeassin- f (ipom-f), a small molecule inhibitor of sec -mediated protein translocation. (c) ipom-f efficiently blocks membrane translocation of secretory proteins and insertion of single-pass type i and type ii tmps, but not insertion of type iii tmps or tail-anchored (ta) proteins. sa denotes a signal anchor. (d) based on known/predicted membrane topology of sars-cov- proteins, and sensitivity of comparable host cell proteins (zong et al., ; o’keefe et al., submitted), likely sensitivity to ipom-f was anticipated. fig. . ipom-f selectively inhibits the er membrane translocation of sars- cov- proteins. (a) schematic of in vitro er import assay using pancreatic microsomes. following translation, fully translocated/membrane inserted radiolabelled precursor proteins are recovered and analysed by sds-page and phosphorimaging. n-glycosylated species were confirmed by treatment with endoglycosidase h (endo h). (b) protein precursors of the human angiotensin- converting enzyme (ace ) and opg -tagged versions of the sars-cov- orf (orf -opg ), spike (s-opg ), envelope (opg -e), membrane (m- opg ) and orf (a doubly-opg tagged version, opg -orf -opg , and two singly-opg tagged forms, opg -orf and orf -opg , with predominant n- glycosylated species in bold) were synthesised in rabbit reticulocyte lysate supplemented with er microsomes without or with ipom-f (lanes and ). phosphorimages of membrane-associated products resolved by sds-page with representative substrate outlines are shown. n-glycosylation was used to measure the efficiency of membrane translocation/insertion and n-glycosylated (x-gly) versus non-n-glycosylated ( gly) species identified using endo h (see lane ). (c) the relative efficiency of membrane translocation/insertion in the .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / ipom-f as a potential antiviral agent page of presence of ipom-f was calculated using the ratio of n-glycosylated protein to non-glycosylated protein, relative to the dmso treated control (set to % efficiency). quantitations are given as mean±s.e.m for independent translation reactions performed in triplicate (n= ) and statistical significance (one-way anova, df and f values shown in the figure) was determined using dunnett’s multiple comparisons test. statistical significance: n.s., non-significant > . ; ****, p < . . fig. . sars-cov- proteins are variably dependent on the sec complex and/or the emc for er membrane translocation/insertion. (a) schematic of in vitro er import assay using control sp cells, or those depleted of a subunit of the sec complex and/or the emc via sirna. following translation, opg - tagged translation products (i.e. membrane-associated and non-targeted nascent chains) were immunoprecipitated, resolved by sds-page and analysed by phosphorimaging. opg -tagged variants of the sars-cov- (b) spike (s- opg ), (c) orf (orf -opg ), (d) envelope (opg -e) and (e) orf (opg - orf -opg species (labelled as for fig. ) were synthesised in rabbit reticulocyte lysate supplemented with control sp cells (lanes - ) or those with impaired sec and/or emc function (lanes - ). radiolabelled products were recovered and analysed as in (a). membrane translocation/insertion efficiency was determined using the ratio of the n-glycosylation of lumenal domains, identified using endo h (eh, lane ), relative to the nt control (set to % translocation/insertion efficiency). quantitations (n= ) and statistical significance (two-way anova, df and f values shown in the figure) determined as for figure . statistical significance: n.s., non-significant > . ; *, p < . ; **, p < . ; ***, p < . ; ****, p < . . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mqbssysh typewritten text figure https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mqbssysh typewritten text figure https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mqbssysh typewritten text figure https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / supplementary information for ipomoeassin-f inhibits the in vitro biogenesis of the sars-cov- spike protein and its host cell membrane receptor sa a o kee e, pe e a roboti, kwabena b. duah, guanghui zong, hayden scheider, wei q. shi and stephen high .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / page of fig. s . additional studies using er microsomes, related to figures and . (a) non-tagged (lanes - ) and opg -tagged (lanes - ) versions of the sars-cov- spike protein (s, s-opg ), orf (orf , orf -opg ) and membrane protein (m, m-opg ) were synthesised in rabbit reticulocyte lysate supplemented with er- derived canine pancreatic microsomes in the absence and presence of ipom-f (lanes and ). phosphorimages of membrane-associated products resolved by sds-page together with representative substrate outlines are shown. n-glycosylated (x-gly) versus non-n-glycosylated ( gly) species were identified by treatment with endoglycosidase h (endo h, lanes and ). (b) the s protein was synthesised in a flexi® rabbit reticulocyte system with varying concentrations of magnesium acetate (lanes - ) and a tnt® coupled system (lane ) in the absence of er-derived .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / microsomes. % of the total reaction material was resolved by sds-page and visualised by phosphorimaging. (c) the er import of truncated variants of the s protein (s-short, s-s.s.-tmd, s-half-opg ) was analysed as described for (a). (d) the membrane-associated products of the doubly tagged form of orf (opg - orf -opg ) were synthesised as in (a) and, following treatment with sodium carbonate buffer and centrifugation, the pellet, enriched for membrane-integrated material, and supernatant, largely containing peripherally membrane-associated material, were analysed for opg -orf -opg . (e) the membrane-associated products of opg -orf -opg were treated with trypsin in the absence or presence of triton- (tx- , lanes - ). .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / page of fig. s . validation of sec and/or emc subunit depletions in sp cells, related to figure . (a) the effects of transfecting hela cells with non-targeting (nt; lane ), sec d- targeting (lane ), emc -targeting (lane ) and sec d+emc -targeting (lane ) sirnas were determined after semi-permeabilisation by immunoblotting for target genes (sec d, emc ). controls to assess destabilisation of the wider emc complex (emc and emc ), any effect on the n-glycosylation machinery (the er-resident kda subunit of the oligosaccharyl-transferase complex (ost ) and the quantity of sp cells used in each experiment (the nuclear protein lamin-b (lmnb )), are also shown. (b) the efficiencies of sirna-mediated knockdown (bold) were calculated as a proportion of the signal intensity obtained with the nt control (set as %). .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / quantitations are given as mean±s.e.m for three separate sirna treatments (n= ) with statistical significance of sirna-mediated knockdowns (two-way anova, df and f a ) d c a . statistical significance is given as n.s., non-significant > . ; *, p < . ; ****, p < . . (c) knockdown efficiencies (mean±s.e.m) for each of the target genes. (d) a truncated variant of the s protein (s-half-opg ) was synthesised in rabbit reticulocyte lysate supplemented with sp cells with impaired sec complex and/or emc function and recovered by immunoprecipitation via the opg tag. radiolabelled products resolved by sds-page and analysed by phosphorimaging. n-glycosylated ( -gly) versus non-n-glycosylated ( gly) species were identified by treatment with endoglycosidase h (endo h, lane ). (e) further analysis of the data presented in fig. e of the main text. here, the ratio of gly and gly bearing opg -orf -opg n- glycosylated species relative to the gly species present in the same sample was used as a proxy to estimate potential mis-insertion of the orf protein in sp cells with impaired sec complex and/or emc function relative to the nt control (set to % efficiency). quantitations are given as mean±s.e.m for independent translation reactions from separate sirna treatments performed in triplicate (n= ) and statistical significance (two-way anova, df and f values shown in the figure) was determined d c a . s a ca ificance is given as n.s., non- significant > . ; *, p < . ; ***, p < . . .cc-by-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nd/ . / dynamic closed states of a ligand-gated ion channel captured by cryo-em and simulations dynamic closed states of a ligand-gated ion channel captured by cryo-em and simulations urška rovšnik , yuxuan zhuang , björn o forsberg , , marta carroni , linnea yvonnesdotter , rebecca j howard , erik lindahl , department of biochemistry and biophysics, science for life laboratory, stockholm university, solna, sweden division of structural biology, wellcome centre for human genetics, university of oxford, ox bn oxford, united kingdom department of applied physics, science for life laboratory, kth royal institute of technology, solna, sweden corresponding author: erik lindahl, science for life laboratory, department of biochemistry and biophysics, stockholm university, solna, sweden; erik.lindahl@scilifelab.se abstract ligand-gated ion channels are critical mediators of electrochemical signal transduction across evolution. biophysical and pharmacological characterization of these receptor proteins relies on high-quality structures in multiple, subtly distinct functional states. however, structural data in this family remain limited, particularly for resting and intermediate states on the activation pathway. here we report cryo-electron microscopy (cryo-em) structures of the proton-activated gloeobacter violaceus ligand-gated ion channel (glic) under three ph conditions. decreased ph was associated with improved resolution and sidechain rearrangements at the subunit/domain interface, particularly involving functionally important residues in the β –β and m –m loops. molecular dynamics simulations substantiated flexibility in the closed-channel extracellular domains relative to the transmembrane ones, and supported electrostatic remodeling around e and e in proton-induced gating. exploration of secondary cryo-em classes further indicated a low-ph population with an expanded pore. these results support a dissection of protonation and activation steps in ph-stimulated conformational cycling in glic, including interfacial rearrangements largely conserved in the pentameric channel family. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / introduction pentameric ligand-gated ion channels are major mediators of fast synaptic transmission in the mammalian nervous system, and serve a variety of biological roles across evolution [ ]. representative x-ray and cryo-electron microscopy (cryo-em) structures in this family have confirmed a five-fold pseudosymmetric architecture, conserved from prokaryotes to humans [ ]. the extracellular domain (ecd) of each subunit contains β-strands β –β , with the characteristic cys- or pro-loop [ ] connecting β –β , and loops a–f enclosing a canonical ligand-binding site [ ] at the interface between principal and complementary subunits. the transmembrane domain (tmd) contains α-helices m –m , with m lining the channel pore, and an intracellular domain of varying length ( – residues) inserted between m and m . extracellular agonist binding is thought to favor subtle structural transitions from resting to intermediate or ‘flip’ states [ ], opening of a transmembrane pore [ ], and in most cases a refractory desensitized phase [ ]. accordingly, a detailed understanding of pentameric channel biophysics and pharmacology depends on high-quality structural templates in multiple functional states. however, high-resolution structures can be biased by stabilizing measures such as ligands, mutations, and crystallization, leaving open questions as to the wild-type activation process. as a model system in this family, the gloeobacter violaceus proton-gated ion channel (glic) has historically offered both insights and limitations [ ]. this prokaryotic receptor has been functionally characterized in multiple cell types [ ] and crystallizes readily under activating conditions (ph ≤ . ) [ ], [ ], producing apparent open structures up to . Å resolution [ ] in the absence and presence of various ligands [ ]–[ ] and mutations [ ]–[ ] . additional low-ph x-ray structures of glic have been reported in lipid-modulated [ ] and so-called locally closed states [ ]–[ ] , with a hydrophobic constriction at the pore midpoint (i , i ’ in prime notation) as predicted for closed channels throughout the family [ ]. crystallography at neutral ph has also been reported, but only to relatively low resolution ( . Å), suggesting a resting state with a relatively expanded, twisted .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ecd as well as a contracted pore [ ], [ ]. alternative structural methods have supported the existence of multiple nonconducting conformations [ ]–[ ] , and biochemical studies have implicated titratable residues including e and e in ph sensing [ ], [ ], [ ], [ ]. however, due in part to limited structural data for wild-type glic in resting, intermediate, or desensitized states, the mechanism of proton gating remains unclear. here, we report single-particle cryo-em structures and molecular dynamics (md) simulations of glic at ph , , and . taking advantage of the relatively flexible conditions accessible to cryo-em, we resolve multiple closed structures, distinct from those previously reported by crystallography. we find rearrangements of e and e differentiate deprotonated versus protonated conditions, providing a dynamic rationale for proton-stimulated remodeling. classification of cryo-em data further indicated a minority population with a contracted ecd and expanded pore. these results support a dissection of protonation and activation steps in ph-stimulated conformational cycling, by which glic preserves a general gating pathway via interfacial electrostatics rather than ligand binding. results differential resolution of glic cryo-em structures with varying ph to characterize the resting state of the prokaryotic pentameric channel glic, we first obtained single-particle cryo-em data under resting conditions (ph ), resulting in a map to . Å overall resolution (fig a–b, fig ev , appendix fig s , appendix fig s , table ). local resolution was between . and . Å in the tmd, including complete backbone traces for all four transmembrane helices. sidechains in the tmd core were clearly resolved (fig ev a), including a constriction at the i hydrophobic gate (i ’, . Å cβ-atom radius), consistent with a closed pore. whereas some extracellular regions were similarly well resolved (fig ev b), local resolution in the ecd was generally lower (fig b), with some atoms that could not be definitively built in the β –β loop, β –β loop (loop f), and at the apical end of the ecd (fig b). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / glic has been thoroughly documented as a proton-gated ion channel, conducting currents in response to low extracellular ph with half-maximal activation around ph [ ]. taking advantage of the flexible buffer conditions accessible to cryo-em, we obtained additional reconstructions under partial and maximal (ph and ph ) activating conditions, producing maps to . Å and . Å, respectively (fig c–d, fig ev , appendix fig s , appendix fig s ). overall map quality improved at lower ph, though local resolution in the tmd remained high relative to the ecd (fig c–d). as a partial check for our map comparisons, we also selected random subsets containing equivalent numbers of particles from each dataset; we found the ph- and ph- datasets still produced higher-quality reconstructions than those at ph (appendix fig s ), indicating that differential resolution could not be trivially attributed to data quantity. surprisingly, backbone alignments of models at both ph and ph indicated close fits to the ph- model (root mean-squared deviation over non-loop cα atoms, rmsd ≤ . Å) in both the ecd and tmd, including a closed conformation of the transmembrane pore (fig b–d, fig a). all three models deviated moderately from resting (pdb id: npq, ecd rmsd ≤ . Å, tmd rmsd ≤ . Å) but further from open x-ray structures (pdb id: hfi, ecd rmsd ≤ . Å, tmd rmsd ≤ . Å), suggesting systematic differences in em versus crystallized conditions, as well as general alignment to a conserved closed-state backbone. still, variations in local resolution and sidechain orientation indicated ph-dependent conformational changes at the subunit-domain interface, as described below. sidechain rearrangements in low-ph structures in the ecd, differential resolution was notable in the β –β loop, particularly in the principal proton-sensor [ ], [ ] residue e . at ph and ph , little definitive density was associated with this sidechain (fig b, left, center); conversely at ph , it clearly extended towards the complementary loop f, forming a possible hydrogen bond with t ( . Å donor-acceptor; fig b, right). notably, this interaction mirrored that observed in open x-ray structures (fig ev ), despite the general absence of open-like backbone rearrangements in the cryo-em structure. at the midpoint of the same β –β loop, density surrounding basic residue k was .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / similarly absent at ph and ph , but clearly defined a sidechain oriented down towards the tmd at ph (fig b). an additional acidic residue, d , could also be uniquely built at ph , oriented in towards the central vestibule. although not in direct contact with neighboring sidechains or domains, its enhanced definition further supported stabilization of the β –β loop. among seven other acidic residues (e , d , d , d , d , d , d ) associated with improved densities at low ph, only d has been shown to substantially influence channel properties [ ]; this residue is involved in an electrostatic network conserved across evolution, with substitutions decreasing channel expression as well as function [ ], suggesting its role may involve assembly or architecture more than proton sensitivity. in the tmd, rearrangements were observed particularly in the m –m loop, a region thought to couple ecd activation to tmd-pore opening. at ph , k at the loop midpoint oriented down toward the m helix, where it could form an intrasubunit hydrogen bond with e . conversely, at ph and ph , k reoriented out towards the complementary subunit. residue k has been implicated in glic ecd-tmd coupling [ ], while e was shown to be an important proton sensor [ ]; indeed, rearrangement of k to an interfacial orientation is also evident in open x-ray structures, with an accompanying iris-like motion of the m –m region—including both k and e —outward from the channel pore (fig ev ). thus, sidechain arrangements in both the ecd and tmd were consistent with proton activation, while maintaining a closed pore. remodeled electrostatic contacts revealed by molecular dynamics to elucidate the basis for variations in local resolution (fig b–d) and sidechain orientation (fig b–d) described above, and assess whether it is a property of the state or experiment, we ran quadruplicate -µs all-atom md simulations of each cryo-em structure, embedded in a lipid bilayer and mm nacl. to further test the role of ph, we ran parallel simulations with a subset of acidic residues modified to approximate the probable protonation pattern under activating conditions, as previously described [ ]. for comparison, x-ray structures reported previously under resting and activating conditions were also simulated, at neutral and low-ph .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / protonation states respectively. simulation rmsd converged to a similar degree within ns (fig ev a), with all except the open x-ray structure dehydrated around the hydrophobic gate (fig ev b). simulations of all three cryo-em structures exhibited elevated rmsd for the extracellular domains (rmsd< . Å) versus transmembrane regions (rmsd< . Å), consistent with higher flexibility in the ecd; both domains exhibited similarly low rmsd in simulations of the open x-ray structure (fig ev a). in the ecd, simulations suggested a dynamic basis for ph-dependent interactions of the e proton sensor at the intersubunit β –β /loop-f interface (fig a–c). under resting (deprotonated) conditions, negatively charged e attracted cations from the extracellular medium, forming a direct electrostatic contact with na + in > % of simulation frames (fig a–b). these environmental ions were not coordinated by other protein motifs in a rigid binding site, potentially explaining poorly resolved densities in this region in neutral-ph structures. cation coordination decreased slightly in the ph- structure even under deprotonated conditions, but was effectively eliminated in all simulations under activating (protonated) conditions. in parallel, mean cα-distances between e and the complementary t contracted in protonated simulations to values approaching the open x-ray structure (fig a, c), as the now-uncharged glutamate released na + and became available to interact with the proximal threonine. in the tmd, simulations further substantiated gating-like rearrangements in the m –m loop (fig d–f). in simulations of the ph- structure under deprotonated conditions, the k sidechain was attracted down in each subunit towards the negatively charged e ; similar to the starting structure (fig c–d), these residues formed an electrostatic contact in > % of trajectory frames (fig d–e). in simulations of the ph- structure, k more often oriented out toward the subunit interface (fig d–e), also as seen in the corresponding structure (fig c–d). moreover, e -k interactions decreased in protonated versus deprotonated simulations of all three structures, with the prevalence of this contact in protonated simulations at ph (< %) approaching that in open x-ray structures (fig e). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / projecting the m –m loop conformations onto the two lowest principal component (pc) degrees of freedom further revealed distinct populations at ph , ph , and ph (fig f). the two dominant pcs for this motif were associated with flipping of k from a downward to outward orientation (pc ), and stretching of the loop across the subunit/domain interface (pc ). projected along these axes, structures determined in decreasing ph conditions increasingly approximated the open x-ray structure, particularly in protonated simulations. thus, in addition to substantiating differential stability in extracellular and transmembrane regions, md simulations offered a rationale for dynamic ph-dependent rearrangements at the subunit/domain interface. minority classes suggest alternative states compared to the best-quality reconstructions obtained at each ph (state , fig b–d), cryo-em data classification in all cases identified minority populations, indicating the presence of multiple conformations that could correspond to functionally relevant states. in particular, a minority class (state ) at ph was visibly contracted and rotated in the ecd relative to ph (state ) (fig ev a). although a complete atomic model could not be built at this resolution ( . Å), refinement of the ph- state- backbone into the state- density revealed systematic reductions in ecd spread and domain twist, echoing transitions from resting to open x-ray structures (fig ev b) [ ], [ ]. minority classes could also be reconstructed at ph and ph , although to lower resolution ( . Å and . Å respectively), and with less apparent divergence from state in each condition (appendix fig s a–c). in the tmd, ph- state also exhibited a tilted conformation of the upper m helices, outward towards the complementary subunit and away from the channel pore relative to state (fig a–c). whereas the upper pore in state- models was almost indistinguishable from that of the resting x-ray structure (fig , appendix fig s a–c), in ph- state it transitioned substantially towards the open x-ray state (fig b). static pore profiles [ ] revealed expansion of ph- state at channel-facing residues s –i (s ’–i ’) (fig d). the open x-ray structure was initially even more expanded: md simulations of that state consistently converged to a more contracted pore at and above s ’; indeed, some open-state replicates sampled .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / profiles overlapping ph- state (fig d), while remaining hydrated at the i ’ hydrophobic gate (fig ev b). in contrast, simulations of state- cryo-em and resting x-ray structures did not substantially contract in the upper pore (appendix fig s d–h). thus, minority classes indicated the presence of alternative functional states consistent with activating transitions at low ph. discussion structures of glic in this work represent the first reported by cryo-em, to our knowledge, covering multiple ph conditions and revealing electrostatic interactions at key subunit interfaces which are further substantiated by microsecond-scale md simulations. our data support a multi-step model for proton activation, in which closed states are characterized by a relatively flexible expanded ecd and a contracted upper pore (fig a). protonation of both ecd (e ) and tmd (e ) glutamates relieves charge interactions associated with the resting state, enabling sidechain remodeling particularly in the β –β and m –m loops, without necessarily altering the backbone fold (fig b). further rearrangements of the backbone are proposed to retain protonated sidechain arrangements by contracting the ecd and expanding the tmd pore, as indicated both by a minority class in our low-ph cryo-em data (fig ), and by comparisons with apparent open x-ray structures (fig c). direct involvement of extracellular loops β –β and f in proton sensing proved consistent with several recent predictions. mutations at β –β residue e were among the most impactful of any acidic residues in previous scanning experiments [ ]. moreover, past spectroscopic studies showed the ph of receptor activation recapitulates the individual pka of this residue, implicating it as the key proton sensor [ ]. in contrast, mutations at k have not been shown to dramatically influence channel function; indeed, previous crosslinking with the m –m loop showed this position can either preserve or inhibit proton activation [ ], suggesting the improved definition we observed for this sidechain at low ph was more a byproduct of local remodeling than a determinant of gating. at e ’s closest contact, loop-f residue t , chemical labeling has been shown to reversibly inhibit .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / activation [ ], supporting a role in channel function. interestingly, loop f adopted a different conformation in our structures at ph compared to ph or ph (fig b–c), suggesting this region samples a range of conformations; indeed, previous spin-labeling studies indicated this position, along with several neighbors on the β strand, to be highly dynamic [ ]. although its broader role in pentameric channel gating remains controversial, loop f has often been characterized as an unstructured motif that undergoes substantial rearrangement during ligand binding [ ], echoing the mechanism proposed here for glic. transmembrane residues e and k have been similarly implicated in channel function, albeit secondary to e in proton sensing. residue e on the upper m helix is exposed to solvent, and has been predicted to protonate at low ph [ ], [ ]. previous studies have shown some mutations at this position to be silent, while others dramatically alter ph sensitivity [ ], [ ], [ ], [ ], suggesting its involvement in state-dependent interactions is complex. interestingly, e has also been shown to mediate interactions with allosteric modulators via a cavity at the intersubunit interface [ ], indicating a role for this residue in agonist sensitivity and/or coupling. at k , cysteine substitution was previously shown to increase proton sensitivity [ ], consistent with a weakening of charge interactions specific to the resting state (fig ). past simulations based on x-ray structures also showed k to prefer intrasubunit interactions at rest, versus intersubunit interactions in the open state [ ], although e /k interactions were particularly apparent in the present work. our reconstructions offer a structural rationale for the predominance of open and locally closed states in the crystallographic literature. the apparent resting state (ph ) was characterized by relatively low reconstructed resolution (fig b, fig a) and flexibility in the ecd (fig ev a, fig a), particularly at the domain interface and peripheral surfaces, potentially conferring entropic favorability. crystallization enforces conformational homogeneity, and may select for rigidified states particularly at crystal-contact surfaces; according to the model above (fig ), such conditions could bias towards a more uniform open state. interestingly, our simulations .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / suggested the apparent open pore of the x-ray structure may not persist outside the crystal, potentially sampling more contracted conformations similar to ph- state (fig ) while remaining generally hydrated (fig ev ). conversely, cryo-em could be expected to reveal favored but flexible states (fig , fig ), with the caveat that there might instead be a bias towards higher-resolution states. a heterogeneous mixture of closed states is notably consistent with previous atomic force microscopy studies in glic [ ]. whereas loose packing of the ecd core has been proposed as a gating strategy specific to eukaryotic members of this channel family [ ]; our data indicate an expanded, flexible ecd may also be important to earlier evolutionary branches. multiple glic structures reported in this work were characterized by closed pores, including states consistent with either deprotonated or protonated conditions. it is theoretically possible that electrostatic conditions might be modified in cryo-em by interaction with the glow-discharged grid or air-water interface, masking effects of protonation. however, we consistently noted subtle shifts in stability and conformation, indicating that local effects of protonation were reflected in the major resolved class. indeed, improved resolution of several acidic residues at low ph appeared consistent with protonation, given the tendency of anionic sidechains to resolve poorly by cryo-em [ ]. notably, the protonated closed state proposed here (fig b) differs from previously reported locally closed and lipid-modulated forms, which have been captured for multiple glic variants at low ph [ ], [ ]–[ ] ; the ecd in these structures is generally indistinguishable from that of the open state, suggesting the corresponding variations or modulators decouple extracellular transitions from pore opening [ ], [ ]. in contrast, the minority class at ph (state ) approached open-state properties in both domains, including a contracted and untwisted ecd (fig ev ) and a partly expanded pore (fig ). with a resting-like backbone configuration, but sidechains consistent with proton activation, the low-ph cryo-em (state- ) structure may correspond to a pre-open state on the opening pathway [ ], [ ], [ ]. the predominance of this state implies a submaximal open probability even at ph . due in part to its low conductance in single-channel recordings [ ], the open probability of glic is not well established; .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / however other family members, including some subtypes of nicotinic acetylcholine and gabaa receptors [ ], [ ], are known to flicker between conductance states even at high agonist concentrations, consistent with a large population of closed channels. an intriguing alternative is that this structure corresponds to a desensitized state, which would be expected to predominate at ph subsequent to channel opening [ ]. however, desensitized states in this family are generally thought to transition through an open state upon ligand dissociation, before returning to rest; aside from sidechain reorientation, no structural rearrangements are immediately obvious that would prevent transition directly to the resting state (fig ). indeed, none of our cryo-em models resembled desensitized structures of other pentameric channels, thought to retain an expanded upper tmd [ ], but block conduction at a secondary, intracellular gate [ ]. although proton activation appears to be a particular adaptation in glic, remodeling at the subunit/domain interface mirrors putative gating mechanisms in several of its ligand-activated relatives (appendix fig s ). in particular, protonation of e and e are proposed to release charge interactions in the β –β loop and upper m helix, enabling remodeling in loop f and the m –m loop (fig , fig , fig b). further rearrangement to the open state contracts both the β –β /m and f/m –m clefts (fig c, appendix fig s a). the same pattern is evident in agonist-bound versus apo structures of elic, glucl, glycine and nicotinic receptors (appendix fig s b–e) [ ]–[ ] , and in open/desensitized versus inhibitor-bound structures of declic and gabaa receptors (appendix fig s f–g) [ ], [ ]. a noted exception is the -ht a receptor, in which loop f instead translocates outward and the m –m loop inward (appendix fig s h), suggesting that apparent open states reported for -ht a may sample a divergent mechanism of gating [ ]–[ ] . the subtle dynamics of allosteric signal transduction in pentameric ligand-gated ion channels, and their sensitivity to drug modulation, have driven substantial interest in characterizing endpoint and intermediate structures along the gating pathway. our data substantiate a protonated closed state, accompanied by a minority population with an expanded pore, and spotlight intrinsic challenges in capturing flexible .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / conformations. we further offer a rationale for proton-stimulated sidechain remodeling of multiple residues at key interfaces, with apparent parallels in other family members. dissection of the gating landscape of a ligand-gated ion channel thus illuminates both insights and limitations of glic as a model system in this family, and support a mechanistic model in which entropy favors a flexible, expanded ecd, with agonists stabilizing rearrangements at the subunit/domain interface. materials and methods glic expression and purification expression and purification of glic-mbp was adapted from protocols published by nury and colleagues [ ]. briefly, c (de ) e. coli transformed with glic-mbp in pet- b were cultured overnight at ° c. cells were inoculated : into xyt media with μg/ml ampicillin, grown at ° c to od = . , induced with μm isopropyl-β-d- -thiogalactopyranoside, and shaken overnight at ° c. membranes were harvested from cell pellets by sonication and ultracentrifugation in buffer a ( mm nacl, mm tris-hcl ph . ) supplemented with mg/ml lysozyme, μg/ml dnase i, mm mgcl , and protease inhibitors, then frozen or immediately solubilized in % n-dodecyl-β-d-maltoside (ddm). fusion proteins were purified in batch by amylose affinity (neb), eluting in buffer b (buffer a with . % ddm) with – mm maltose, then further purified by size exclusion chromatography in buffer b. after overnight thrombin digestion, glic was isolated from its fusion partner by size exclusion, and concentrated to – mg/ml by centrifugation. cryo-em sample preparation and data acquisition for freezing, quantifoil . / . cu mesh grids (quantifoil micro tools) were glow-discharged in methanol vapor prior to sample application. μl sample was applied to each grid, which was then blotted for . s and plunge-frozen into liquid ethane using a fei vitrobot mark iv. micrographs were collected on an fei titan krios kv microscope with a post energy filter gatan k -summit direct detector camera. movies were collected at nominal , x magnification, equivalent to a .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / pixel spacing of . Å. a total dose of . e−/Å was used to collect frames over sec, using a nominal defocus range covering - . to - . µm. image processing motion correction was carried out with motioncor [ ]. all subsequent processing was performed through the relion . pipeline [ ]. defocus was estimated from the motion corrected micrographs using ctffind [ ]. following manual picking, initial d classification was performed to generate references for autopicking. particles were extracted after autopicking, binned and aligned to a Å density generated from the glic crystal structure (pdb id: hfi [ ]) by d auto-refinement. the acquired alignment parameters were used to identify and remove aberrant particles and noise through multiple rounds of pre-aligned d- and d-classification. the pruned set of particles was then refined, using the initially obtained reconstruction as reference. per-particle ctf parameters were estimated from the resulting reconstruction using relion . . global beam-tilt was estimated from the micrographs and correction applied. micelle density was eventually subtracted and the final d auto-refinement was performed using a soft mask covering the protein, followed by post-processing, utilizing the same mask. local resolution was estimated using the relion implementation. post-processed densities were improved using resolvecryoem, a part of the phenix package (release . and later) [ ] based on maximum-likelihood density modification, previously used to improve maps in x-ray crystallography [ ]. densities from both relion post-processing and resolvecryoem were used for building; figures show output from resolvecryoem (fig , fig ev ). densities for minority classes were obtained by systematic and extensive d-classification rounds in relion . , with iterative modifications to parameters including angular search, t parameter, and class number. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / model building models were built starting from a template using an x-ray structure determined at ph (pdb id: npq [ ], chain a), fitted to each reconstructed density. phenix . . - [ ] real-space refinement was used to refine this model, imposing -fold symmetry through ncs restraints detected from the reconstructed cryo-em map. the model was incrementally adjusted in coot . . . el [ ] and re-refined until conventional quality metrics were optimized in agreement with the reconstruction. model statistics are summarized in table . model alignments were performed using the match function in ucsf chimera [ ] on cα atoms, excluding extracellular loops, for residues – (ecd) or – (tmd). md simulations manually built cryo-em structures, as well as previously published x-ray structures (resting, pdb id: npq [ ]; open, pdb id: hfi [ ]), were used as starting models for md simulations. the amber sb-ildn force field [ ] was used to describe protein interactions. each protein was embedded in a bilayer of berger [ ] -palmitoyl- -oleoyl- sn -glycero- -phosphocholine lipids. each system was solvated in a * * nm box using the tip p water model [ ], and nacl was added to bring the system to neutral charge and an ionic strength of mm. all simulations were performed with gromacs . [ ]. systems were energy-minimized using the steepest descent algorithm, then relaxed for ps in the nvt ensemble at k using the velocity rescaling thermostat [ ]. bond lengths were constrained [ ], particle mesh ewald long-range electrostatics used [ ], and virtual sites for hydrogen atoms implemented, enabling a time step of fs. heavy atoms of the protein were restrained during relaxation, followed by another ns of npt relaxation at bar using parrinello-rahman pressure coupling [ ] and gradually releasing the restraints. finally, the system was relaxed with all unresolvable residues unrestrained for an additional ns. for each relaxed system, four replicates of μs unrestrained simulations were generated. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / analyses were performed using vmd [ ], chap [ ], and mdtraj [ ]. time-dependent rmsds were calculated for cα atoms in generally resolved regions of the ecd (residues – , – ) or tmd (residues – ). the number of sodium ions around e was quantified within a distance of Å, using simulation frames sampled every ns ( total frames from simulations in each condition), as described in fig . pc analysis of the m –m loop was performed on cα atoms of residues e –p of five superposed static models (three cryo-em structures, resting and open x-ray structures), treating each subunit separately. the simulations were then projected onto pc ( % of the variance) versus pc ( % of the variance), and were plotted using kernel density estimation. representative motions for pc and pc were visualized as sequences of snapshots from blue (negative values) to purple (positive values). ecd radius and domain twist were quantified as in previous work [ ]. ecd radius was determined by the average distance from the cα-atom center-of-mass (com) of each subunit ecd to that of the full ecd, projected onto a plane perpendicular to the channel axis. domain twist was determined by the average dihedral angle defined by com coordinates of ) a single subunit-ecd, ) the full ecd, ) the full tmd, and ) the same single-subunit tmd. data availability three-dimensional cryo-em density maps of the pentameric ligand-gated ion channel glic in detergent micelles have been deposited in the electron microscopy data bank under accession numbers emd- (ph ), emd- (ph ) and emd- (ph ), respectively. each deposition includes the cryo-em sharpened and unsharpened maps, both half-maps and the mask used for final fsc calculation. coordinates of all models have been deposited in the protein data bank. the accession numbers for the three glic structures are zgd (ph ), zgj (ph ) and zgk (ph ). full input data, parameters, settings, commands and trajectory subsets from md simulations are archived at zenodo.org under doi: . /zenodo. . densities for minority classes are available upon request. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / acknowledgments the authors would like to thank the swedish cryo-em national facility staff, in particular julian conrad, josé miguel de la rosa trevin and stefan fleischmann from stockholm and michael hall from umeå, for kind assistance with data collection, modeling and supervision. this work was supported by grants from the knut and alice wallenberg foundation, the swedish research council ( - , - , - ), the swedish e-science research centre, and the bioexcel center of excellence (eu ). ur was supported by a scholarship from the sven and lilly lawski foundation. the cryo-em data were collected at the swedish national cryo-em facility funded by the knut and alice wallenberg foundation, erling persson and kempe foundations. computational resources were provided by the swedish national infrastructure for computing. author contributions conceptualisation: rjh, el; methodology: ur, yz, bof, rjh; software: ur, yz, bof; validation: ur, yz, bof, mc, ly; formal analysis: ur, yz; investigation: ur, yz, rjh; resources: mc, rjh, el; data curation: ur, yz, rjh, el; original draft: ur, yz, rjh; review & editing: ur, yz, bof, mc, ly, rjh, el; visualization: ur, yz, rjh; supervision: rjh, el; project administration: mc, rjh; funding acquisition: el. conflict of interest the authors declare that they have no conflict of interest. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / references [ ] a. tasneem, l. m. iyer, e. jakobsson, and l. aravind, “identification of the prokaryotic ligand-gated ion channels and their implications for the mechanisms and origins of animal cys-loop ion channels,” genome biol., vol. , p. r , dec. , doi: . /gb- - - -r . [ ] Á. nemecz, m. s. prevost, a. menny, and p.-j. corringer, “emerging molecular mechanisms of signal transduction in pentameric ligand-gated ion channels,” neuron, vol. , no. , pp. – , may , doi: . /j.neuron. . . . [ ] m. jaiteh, a. taly, and j. hénin, “evolution of pentameric ligand-gated ion channels: pro-loop receptors,” plos one, vol. , no. , mar. , doi: . /journal.pone. . [ ] t. lynagh and s. a. pless, “principles of agonist recognition in cys-loop receptors,” front. physiol., vol. , p. , , doi: . /fphys. . . [ ] Á. nemecz, m. s. prevost, a. menny, and p.-j. corringer, “emerging molecular mechanisms of signal transduction in pentameric ligand-gated ion channels,” neuron, vol. , no. , pp. – , may , doi: . /j.neuron. . . . [ ] c. j. b. dacosta and j. e. baenziger, “gating of pentameric ligand-gated ion channels: structural insights and ambiguities,” structure, vol. , no. , pp. – , aug. , doi: . /j.str. . . . [ ] m. gielen and p.-j. corringer, “the dual-gate model for pentameric ligand-gated ion channels activation and desensitization,” j. physiol., vol. , no. , pp. – , , doi: . /jp . [ ] p.-j. corringer et al., “atomic structure and dynamics of pentameric ligand-gated ion channels: new insight from bacterial homologues,” j. physiol., vol. , no. , pp. – , , doi: . /jphysiol. . . [ ] n. bocquet et al., “a prokaryotic proton-gated ion channel from the nicotinic acetylcholine receptor family,” nature, vol. , no. , p. , jan. , doi: . /nature . [ ] r. j. c. hilf and r. dutzler, “structure of a potentially open state of a proton-activated pentameric ligand-gated ion channel,” nature, vol. , no. , pp. – , jan. , doi: . /nature . [ ] n. bocquet et al., “x-ray structure of a pentameric ligand-gated ion channel in an apparently open conformation,” nature, vol. , no. , pp. – , jan. , doi: . /nature . [ ] h. hu et al., “electrostatics, proton sensor, and networks governing the gating transition in glic, a proton-gated pentameric ion channel,” proc. natl. acad. sci. u. s. a., vol. , no. , pp. e –e , dec. , doi: . /pnas. . [ ] r. j. c. hilf, c. bertozzi, i. zimmermann, a. reiter, d. trauner, and r. dutzler, “structural basis of open channel block in a prokaryotic pentameric ligand-gated ion channel,” nat. struct. mol. biol., vol. , no. , pp. – , nov. , doi: . /nsmb. . [ ] h. nury et al., “x-ray structures of general anaesthetics bound to a pentameric ligand-gated ion channel,” nature, vol. , no. , pp. – , jan. , doi: . /nature . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] j. pan et al., “structure of the pentameric ligand-gated ion channel glic bound with anesthetic ketamine,” struct. lond. engl. , vol. , no. , pp. – , sep. , doi: . /j.str. . . . [ ] l. sauguet et al., “structural basis for potentiation by alcohols and anaesthetics in a ligand-gated ion channel,” nat. commun., vol. , p. ncomms , apr. , doi: . /ncomms . [ ] l. sauguet et al., “structural basis for ion permeation mechanism in pentameric ligand-gated ion channels,” embo j., vol. , no. , pp. – , mar. , doi: . /emboj. . . [ ] z. fourati, l. sauguet, and m. delarue, “genuine open form of the pentameric ligand-gated ion channel glic,” acta crystallogr. d biol. crystallogr., vol. , no. , pp. – , mar. , doi: . /s . [ ] l. sauguet, z. fourati, t. prangé, m. delarue, and n. colloc’h, “structural basis for xenon inhibition in a cationic pentameric ligand-gated ion channel,” plos one, vol. , no. , p. e , feb. , doi: . /journal.pone. . [ ] b. laurent, s. murail, a. shahsavar, l. sauguet, m. delarue, and m. baaden, “sites of anesthetic inhibitory action on a cationic ligand-gated ion channel,” structure, vol. , no. , pp. – , apr. , doi: . /j.str. . . . [ ] z. fourati et al., “structural basis for a bimodal allosteric mechanism of general anesthetic modulation in pentameric ligand-gated ion channels,” cell rep., vol. , no. , pp. – , apr. , doi: . /j.celrep. . . . [ ] z. fourati, l. sauguet, and m. delarue, “structural evidence for the binding of monocarboxylates and dicarboxylates at pharmacologically relevant extracellular sites of a pentameric ligand-gated ion channel,” acta crystallogr. sect. struct. biol., vol. , no. , pp. – , jul. , doi: . /s x. [ ] h. nury et al., “one-microsecond molecular dynamics simulation of channel gating in a nicotinic receptor homologue,” proc. natl. acad. sci., vol. , no. , pp. – , apr. , doi: . /pnas. . [ ] d. mowrey, q. chen, y. liang, j. liang, y. xu, and p. tang, “signal transduction pathways in the pentameric ligand-gated ion channels,” plos one, vol. , no. , p. e , maj , doi: . /journal.pone. . [ ] g. gonzalez-gutierrez, y. wang, g. d. cymes, e. tajkhorshid, and c. grosman, “chasing the open-state structure of pentameric ligand-gated ion channels,” j. gen. physiol., p. jgp. , oct. , doi: . /jgp. . [ ] Á. nemecz, h. hu, z. fourati, c. van renterghem, m. delarue, and p.-j. corringer, “full mutational mapping of titratable residues helps to identify proton-sensors involved in the control of channel gating in the gloeobacter violaceus pentameric ligand-gated ion channel,” plos biol., vol. , no. , dec. , doi: . /journal.pbio. . [ ] s. basak, n. schmandt, y. gicheru, and s. chakrapani, “crystal structure and dynamics of a lipid-induced potential desensitized-state of a pentameric ligand-gated channel,” elife, vol. , , doi: . /elife. . [ ] m. s. prevost et al., “a locally closed conformation of a bacterial pentameric proton-gated ion channel,” nat. struct. mol. biol., vol. , no. , p. nsmb. , may , doi: . /nsmb. . [ ] g. gonzalez-gutierrez, l. g. cuello, s. k. nair, and c. grosman, “gating of the proton-gated ion channel from gloeobacter violaceus at ph as revealed by .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / x-ray crystallography,” proc. natl. acad. sci. u. s. a., vol. , no. , pp. – , nov. , doi: . /pnas. . [ ] c. bertozzi, i. zimmermann, s. engeler, r. j. c. hilf, and r. dutzler, “signal transduction at the domain interface of prokaryotic pentameric ligand-gated ion channels,” plos biol., vol. , no. , p. e , mar. , doi: . /journal.pbio. . [ ] z. fourati et al., “barbiturates bind in the glic ion channel pore and cause inhibition by stabilizing a closed state♦,” j. biol. chem., vol. , no. , pp. – , feb. , doi: . /jbc.m . . [ ] a. j. thompson, h. a. lester, and s. c. r. lummis, “the structural basis of function in cys-loop receptors,” q. rev. biophys., vol. , no. , pp. – , nov. , doi: . /s . [ ] l. sauguet et al., “crystal structures of a pentameric ligand-gated ion channel provide a mechanism for activation,” proc. natl. acad. sci. u. s. a., vol. , no. , pp. – , jan. , doi: . /pnas. . [ ] a. taly, j. hénin, j.-p. changeux, and m. cecchini, “allosteric regulation of pentameric ligand-gated ion channels,” channels, vol. , no. , pp. – , jul. , doi: . /chan. . [ ] p. velisetty and s. chakrapani, “desensitization mechanism in prokaryotic ligand-gated ion channel,” j. biol. chem., vol. , no. , pp. – , may , doi: . /jbc.m . . [ ] y. ruan et al., “structural titration of receptor ion channel glic gating by hs-afm,” proc. natl. acad. sci. u. s. a., vol. , no. , pp. – , oct. , doi: . /pnas. . [ ] a. menny et al., “identification of a pre-active conformation of a pentameric channel receptor,” elife, vol. , doi: . /elife. . [ ] b. lev et al., “string method solution of the gating pathways for a pentameric ligand-gated ion channel,” proc. natl. acad. sci., vol. , no. , pp. e –e , may , doi: . /pnas. . [ ] g. klesse, s. rao, m. s. p. sansom, and s. j. tucker, “chap: a versatile tool for the structural and functional annotation of ion channel pores,” j. mol. biol., vol. , no. , pp. – , aug. , doi: . /j.jmb. . . . [ ] p. velisetty, s. v. chalamalasetti, and s. chakrapani, “structural basis for allosteric coupling at the membrane-protein interface in glic,” j. biol. chem., p. jbc.m . , dec. , doi: . /jbc.m . . [ ] m. nys, d. kesters, and c. ulens, “structural insights into cys-loop receptor function and ligand recognition,” biochem. pharmacol., vol. , no. , pp. – , oct. , doi: . /j.bcp. . . . [ ] r. j. howard et al., “structural basis for alcohol modulation of a pentameric ligand-gated ion channel,” proc. natl. acad. sci. u. s. a., vol. , no. , pp. – , jul. , doi: . /pnas. . [ ] c. d. dellisanti, s. m. hanson, l. chen, and c. czajkowski, “packing of the extracellular domain hydrophobic core has evolved to facilitate pentameric ligand-gated ion channel function,” j. biol. chem., vol. , no. , pp. – , feb. , doi: . /jbc.m . . [ ] c. f. hryc et al., “accurate model annotation of a near-atomic resolution cryo-em map,” proc. natl. acad. sci., mar. , doi: . /pnas. . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] r. lape, d. colquhoun, and l. g. sivilotti, “on the nature of partial agonism in the nicotinic receptor superfamily,” nature, vol. , no. , pp. – , aug. , doi: . /nature . [ ] n. mukhtasimova, w. y. lee, h.-l. wang, and s. m. sine, “detection and trapping of intermediate states priming nicotinic receptor channel opening,” nature, vol. , no. , p. , may , doi: . /nature . [ ] c. carignano, e. p. barila, and g. spitzmaul, “analysis of neuronal nicotinic acetylcholine receptor α β activation at the single-channel level,” biochim. biophys. acta, vol. , no. , pp. – , sep. , doi: . /j.bbamem. . . . [ ] a. l. germann, s. r. pierce, t. c. senneff, a. b. burbridge, j. h. steinbach, and g. akk, “steady-state activation and modulation of the synaptic-type α β γ l gabaa receptor by combinations of physiological and clinical ligands,” physiol. rep., vol. , no. , p. e , , doi: https://doi.org/ . /phy . . [ ] p. kumar et al., “cryo-em structures of a lipid-sensitive pentameric ligand-gated ion channel embedded in a phosphatidylcholine-only bilayer,” proc. natl. acad. sci., vol. , no. , pp. – , jan. , doi: . /pnas. . [ ] t. althoff, r. e. hibbs, s. banerjee, and e. gouaux, “x-ray structures of glucl in apo states reveal a gating mechanism of cys-loop receptors,” nature, vol. , no. , pp. – , aug. , doi: . /nature . [ ] r. e. hibbs and e. gouaux, “principles of activation and permeation in an anion-selective cys-loop receptor,” nature, vol. , no. , pp. – , jun. , doi: . /nature . [ ] a. kumar et al., “mechanisms of activation and desensitization of full-length glycine receptor in lipid nanodiscs,” nat. commun., vol. , no. , p. , jul. , doi: . /s - - - . [ ] m. m. rahman et al., “structure of the native muscle-type nicotinic receptor and inhibition by snake venom toxins,” neuron, vol. , no. , pp. - .e , jun. , doi: . /j.neuron. . . . [ ] a. gharpure et al., “agonist selectivity and ion permeation in the α β ganglionic nicotinic receptor,” neuron, vol. , no. , pp. - .e , nov. , doi: . /j.neuron. . . . [ ] h. hu, r. j. howard, u. bastolla, e. lindahl, and m. delarue, “structural basis for allosteric transitions of a multidomain pentameric ligand-gated ion channel,” proc. natl. acad. sci., vol. , no. , pp. – , jun. , doi: . /pnas. . [ ] j. j. kim et al., “shared structural mechanisms of general anaesthetics and benzodiazepines,” nature, vol. , no. , pp. – , sep. , doi: . /s - - - . [ ] s. basak, y. gicheru, s. rao, m. s. p. sansom, and s. chakrapani, “cryo-em reveals two distinct serotonin-bound conformations of full-length -ht a receptor,” nature, vol. , no. , p. , nov. , doi: . /s - - - . [ ] l. polovinkin et al., “conformational transitions of the serotonin -ht receptor,” nature, vol. , no. , pp. – , nov. , doi: . /s - - - . [ ] s. basak et al., “cryo-em structure of -ht a receptor in its resting conformation,” nat. commun., vol. , no. , dec. , doi: .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / . /s - - - . [ ] s. q. zheng, e. palovcak, j.-p. armache, k. a. verba, y. cheng, and d. a. agard, “motioncor - anisotropic correction of beam-induced motion for improved cryo-electron microscopy,” nat. methods, vol. , no. , pp. – , apr. , doi: . /nmeth. . [ ] j. zivanov et al., “new tools for automated high-resolution cryo-em structure determination in relion- ,” elife, vol. , p. e , nov. , doi: . /elife. . [ ] a. rohou and n. grigorieff, “ctffind : fast and accurate defocus estimation from electron micrographs,” j. struct. biol., vol. , no. , pp. – , nov. , doi: . /j.jsb. . . . [ ] p. d. adams et al., “phenix : a comprehensive python-based system for macromolecular structure solution,” acta crystallogr. d biol. crystallogr., vol. , no. , pp. – , feb. , doi: . /s . [ ] t. c. terwilliger, s. j. ludtke, r. j. read, p. d. adams, and p. v. afonine, “improvement of cryo-em maps by density modification,” nat. methods, vol. , no. , art. no. , sep. , doi: . /s - - - . [ ] p. emsley and k. cowtan, “coot : model-building tools for molecular graphics,” acta crystallogr. d biol. crystallogr., vol. , no. , pp. – , dec. , doi: . /s . [ ] e. f. pettersen et al., “ucsf chimera--a visualization system for exploratory research and analysis,” j. comput. chem., vol. , no. , pp. – , oct. , doi: . /jcc. . [ ] k. lindorff-larsen et al., “improved side-chain torsion potentials for the amber ff sb protein force field,” proteins, vol. , no. , pp. – , jun. , doi: . /prot. . [ ] o. berger, o. edholm, and f. jähnig, “molecular dynamics simulations of a fluid bilayer of dipalmitoylphosphatidylcholine at full hydration, constant pressure, and constant temperature.,” biophys. j., vol. , no. , pp. – , may . [ ] w. l. jorgensen, j. chandrasekhar, j. d. madura, r. w. impey, and m. l. klein, “comparison of simple potential functions for simulating liquid water,” j. chem. phys., vol. , no. , pp. – , jul. , doi: . / . . [ ] m. j. abraham et al., “gromacs: high performance molecular simulations through multi-level parallelism from laptops to supercomputers,” softwarex, vol. – , pp. – , sep. , doi: . /j.softx. . . . [ ] g. bussi, d. donadio, and m. parrinello, “canonical sampling through velocity rescaling,” j. chem. phys., vol. , no. , p. , jan. , doi: . / . . [ ] b. hess, “p-lincs: a parallel linear constraint solver for molecular simulation,” j. chem. theory comput., vol. , no. , pp. – , jan. , doi: . /ct b. [ ] u. essmann, l. perera, m. l. berkowitz, t. darden, h. lee, and l. g. pedersen, “a smooth particle mesh ewald method,” j. chem. phys., vol. , no. , pp. – , nov. , doi: . / . . [ ] m. parrinello and a. rahman, “crystal structure and pair potentials: a molecular-dynamics study,” phys. rev. lett., vol. , no. , pp. – , oct. , doi: . /physrevlett. . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / [ ] w. humphrey, a. dalke, and k. schulten, “vmd: visual molecular dynamics,” j. mol. graph., vol. , no. , pp. – , feb. , doi: . / - ( ) - . [ ] r. t. mcgibbon et al., “mdtraj: a modern open library for the analysis of molecular dynamics trajectories,” biophys. j., vol. , no. , pp. – , oct. , doi: . /j.bpj. . . . .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure legends figure : differential resolution of glic cryo-em structures with varying ph. a. cartoon representations of glic, viewed from the membrane plane (top) or from the extracellular side (bottom). pentameric rings represent the connected extracellular (ecd, light gray) and transmembrane (tmd, medium gray) domains, with the latter embedded in a lipid bilayer (gradient) and surrounding a membrane-spanning pore formed by the second helix from each subunit (m , dark gray). b. cryo-em density for the majority class (state ) at ph to . Å overall resolution, viewed as in panel a from the membrane plane (top) or from the extracellular side (bottom). density is colored by local resolution according to scale bar at far right, and contoured at both high (left) and low threshold (right) to reveal fine and coarse detail, respectively. c. density viewed as in panel b for state at ph , reconstructed to . Å overall resolution. d. density as in panel b for state at ph , reconstructed to . Å overall resolution. figure : sidechain rearrangements at subunit interfaces in low-ph structures. a. overlay of predominant (state- ) glic cryo-em structures at ph (blue), ph (green), and ph (lavender), aligned on the full pentamer. two adjacent subunits are viewed as ribbons from the channel pore, showing key motifs including the β –β and pro loops and m –m helices from the principal subunit (p), and loop f from the complementary subunit (c). b. zoom views of the upper gray-boxed region in panel a, showing cryo-em densities (mesh at σ = . ) and sidechain atoms (sticks, colored by heteroatom) around the intersubunit ecd interface between a single principal β –β loop and complementary loop f at each ph. as indicated by dotted circles, sidechains including β –β residues k and e could not be definitively built at ph (left) or ph (center), but were better resolved at ph (right), including a possible hydrogen bond between e and t (dashed line, . Å). c. zoom views of the black-boxed region in panel a, showing key sidechains (sticks, colored by heteroatom) at the domain interface between one principal β –β , pre-m , and m –m region, and the complementary loop-f and m region. dotted circles indicate sidechains that could not be definitively built in the corresponding conditions; dashed lines indicate possible hydrogen bonds .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / implicated here in proton-stimulated conformational cycling. residues contributing to a conserved electrostatic network at the domain interface (d , r , y ) are also shown. d. zoom views of the lower gray-boxed region in panel a, showing cryo-em densities (mesh) and sidechain atoms (sticks, colored by heteroatom) around the intersubunit tmd interface between principal and complementary m –m regions at each ph. a potential hydrogen bond between e and k at ph (left, dashed line, . Å) is disrupted at ph (center) and ph (right), allowing k to reorient towards the subunit interface. figure : remodeled electrostatic contacts revealed by molecular dynamics. a. zoom views as in fig b of the ecd interface between a single principal (p, right) β –β loop and complementary (c, left) loop f (lavender ribbons) in representative snapshots from md simulations of the ph- (state- ) cryo-em structure, with sidechains modified to approximate resting (deprotonated, top) or activating (protonated, bottom) conditions. depicted residues and proximal ions (sticks, colored by heteroatom) show deprotonated e in contact with na +, while protonated e interacts with t . b. charge contacts between e and environmental na + ions in simulations under deprotonated (solid) but not protonated (striped) conditions of state- cryo-em structures determined at ph (blue), ph (green), or ph (lavender). histograms represent median ± % confidence interval (ci) over all simulations in the corresponding condition. horizontal bars represent median ± ci values for simulations of resting (gray) or open (black) x-ray structures. c. histograms as in panel b showing intersubunit cα-distances between e and t , which decrease in protonated (striped) versus deprotonated (solid) conditions. d. zoom views as in fig d of the tmd interface between principal (p, right) and complementary (c, left) m –m loops (lavender ribbons) in representative snapshots from simulations of the ph- (state- ) cryo-em structure. depicted residues (sticks, colored by heteroatom) show k oriented down towards e in deprotonated conditions (top), but out towards the subunit interface in protonated conditions (bottom). e. histograms as in panel b showing electrostatic contacts between e and k , which decrease in ph- (lavender) versus ph- (blue) and ph- structures (green), and in protonated (striped) versus deprotonated (solid) simulation conditions. f. principal component (pc) analysis of m –m loop motions in simulations under deprotonated (top) or protonated conditions (bottom) of state- cryo-em structures determined at ph (blue), ph (green), and ph (lavender). for comparison, simulations of previous resting (gray) and open (black) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / x-ray structures are shown at right, and open-structure results are superimposed in each panel. inset cartoons illustrate structural transitions associated with dominant pcs (blue–lavender from negative to positive values), representing flipping of residue k (pc ) and stretching of the m –m loop (pc ). figure : minority classes suggest alternative states. a. overlay as in figure a of state- (lavender) and state- (purple) glic cryo-em structures, along with apparent resting (white, pdb id: npq) and open (gray, pdb id: hfi) x-ray structures, aligned on the full pentamer. adjacent principal (p) and complementary (c) subunits are viewed as ribbons from the channel pore. b. zoom views of the black-boxed region in panel a, showing key motifs at the domain interface between one principal β -β , pre-m , and m –m region, and the complementary loop-f and m region, for resting (white) and open (gray) x-ray structures overlaid with ph- cryo-em state (top, lavender) or state (bottom, purple). c. zoom views as in panel b, showing cryo-em densities (mesh) and backbone ribbons for ph- state (top, lavender) or state (bottom, purple). d. pore profiles [ ] representing cα radii for ph- cryo-em state- (lavender) and state- (purple) structures, open x-ray (black) structure, and quadruplicate -μs md simulations of the open x-ray model (median, dashed black; % confidence interval, gray). figure : protonation and activation in glic ph gating. a. cartoon of the glic resting state, corresponding to a deprotonated closed conformation, as represented by the predominant cryo-em structure at ph . views are of the full protein (top) from the membrane plane, and of the ecd (middle) and tmd (bottom) from the extracellular side, showing key motifs at two opposing subunit interfaces including the principal β –β (green) and m –m loops (blue), complementary f (purple) and β –β (dark gray) loops, and the remainder of the protein in light gray. by the model proposed here, under resting conditions the key acidic residue e (green circles) in the β –β loop is deprotonated, and involved in transient interactions with environmental cations (e.g. na +, black circles). flexibility of the corresponding ecd is indicated by motion lines, associated with relatively low resolution by cryo-em and high rmsd ibn md simulations. in parallel, deprotonated e (light blue circles) in the m helix attracts k (dark blue circles) in the m –m loop, maintaining a contracted upper pore. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / b. cartoon as in panel a, showing a protonated but still closed conformation, as represented by the predominant cryo-em structure at ph . in the ecd, protonation of e releases environmental cations and enables it instead to form a stabilizing contact with the complementary subunit via t (purple circles) in loop f, associated with partial rigidification of the ecd. in the tmd, protonation of e releases k , allowing it to orient outward/upward towards the subunit/domain interface. c. cartoon as in panel a, showing the putative protonated open state, as represented by previous open x-ray structures. key sidechains (e , t , e , k ) are arranged similar to the protonated closed state, accompanied by general contraction of the ecd including loop f, expansion of the upper tmd including the m –m loop, and opening of the ion conduction pathway. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / table table : cryo-em data processing and model building statistics. data collection and processing ph data set ph data set ph data set microscope fei titan krios fei titan krios fei titan krios magnification , , , voltage (kv) electron exposure (e - /Å ) ~ ~ ~ defocus range (μm) . – . . – . . – . pixel size (Å) . . . symmetry imposed c c c number of images ~ ~ ~ particles picked ~ , ~ million ~ , particles refined , , , refinement initial model used npq npq npq resolution (Å) . . . fsc threshold . . . map sharpening b-factor - - - model composition non-hydrogen protein atoms , , , protein residues ligands b-factor (Å ) rmsd bond lengths (Å) . . . bond angles (º) . . . validation molprobity score . . . clashscore . . . poor rotamers (%) ramachandran plot favored (%) . . . allowed (%) . . . outliers (%) .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / expanded view figure legends figure ev : cryo-em image-processing pipeline. a. representative micrograph from grid screening on a falcon- detector (talos-arctica), showing detergent-solubilized glic particles. b. representative d class averages at . Å/px in a x pixel box and a -Å mask. c. overview of cryo-em processing pipelines for data collected at ph (blue), ph (green), and ph (lavender) (see methods). figure ev : cryo-em densities in α-helical and β-strand regions. a. density (mesh) and corresponding atomic model (sticks, colored by heteroatom) for the m helix (e –e ) at ph (blue, left), ph (green, center), and ph (lavender, right). b. density and corresponding model, shown as in panel a, for the β strand (p –i ). sidechains that could not be definitively built at ph (d , q , l ) are represented by cβ atoms. figure ev : interfacial rearrangements in previous x-ray structures. a. overlay as in figure a of previous x-ray structures crystallized under resting (white, pdb id: npq) and activating (gray, pdb id: hfi) conditions. two adjacent subunits are viewed as ribbons from the channel pore, showing key motifs including the β –β and pro loops and m –m helices from the principal subunit (p), and loop f from the complementary subunit (c). b. zoom views as in figure c of the black-boxed region in panel a, showing key sidechains (sticks, colored by heteroatom) at a single domain interface in resting (white, left) and open (gray, right) x-ray structures. dotted circle indicates the sidechain of k , which could not be definitively built in resting conditions. center panel shows major backbone transitions from overlaid resting to open states (orange arrows). .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / figure ev : ecd flexibility in closed-pore simulations. a. root mean-squared deviations (rmsds) over time for cα-atoms of the ecd (solid) and tmd (dotted) in four replicate -μs md simulations of cryo-em structures determined at ph (blue), ph (green), and ph (lavender). simulations were performed with sidechain charges approximating resting (deprotonated, top) or activating (protonated, bottom) conditions [ ]. reference simulations of resting (gray, top) and open (black, bottom) x-ray structures are shown at right. b. hydration at the hydrophobic gate during simulations under deprotonated (solid) or protonated (striped) conditions as depicted in panel a, quantified by water occupancy between i (i ’) and a (a ’) in the channel pore. histograms represent median ± % confidence interval (ci) over all simulations in the corresponding condition. figure ev : contraction and untwisting of the ecd in ph- state . a. views as in figure b of ph- state (lavender) and state (purple) cryo-em densities, shown from the membrane plane (left) or extracellular side (right). arrows represent inward contraction and counter-clockwise untwisting of the ecd in state relative to state . b. histograms indicating parallel trends in ecd contraction (left) and untwisting (right) from resting (gray) to open (black) x-ray structures, and from ph- state- (lavender) to state- (purple) cryo-em structures. .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / ecd tmd ecd tmd ecd tmd m ecd tmd m .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / a b cdeprotonated closed protonated openprotonated closed + + + + + + + + + + figure ecd tmd m β –β + – + – + – + – f tmd m ecd + – β –β + – f –+ + – m –m m –m .cc-by . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by/ . / disorder is a critical component of lipoprotein sorting in gram-negative bacteria disorder is a critical component of lipoprotein sorting in gram-negative bacteria jessica el rayes , $, joanna szewczyk , $, michael deghelt , , andré matagne , bogdan i. iorga , seung-hyun cho , , and jean-françois collet , * welbio, avenue hippocrate , brussels, belgium. de duve institute, université catholique de louvain, avenue hippocrate , brussels, belgium. centre d’ingéniérie des protéines, institut de chimie b , université de liège, allée de la chimie , liège, sart tilman, belgium. université paris-saclay, cnrs upr , institut de chimie des substances naturelles, gif-sur-yvette, france. $both authors contributed equally to the work *correspondence: jfcollet@uclouvain.be abstract ( max) gram-negative bacteria express structurally diverse lipoproteins in their envelope. here we found that approximately half of lipoproteins destined to the escherichia coli outer membrane display an intrinsically disordered linker at their n-terminus. intrinsically disordered regions are common in proteins, but establishing their importance in vivo has remained challenging. here, as we sought to unravel how lipoproteins mature, we discovered that unstructured linkers are required for optimal trafficking by the lol lipoprotein sorting system: linker deletion re-routes three unrelated lipoproteins to the inner membrane. focusing on the stress sensor rcsf, we found that replacing the linker with an artificial peptide restored normal outer membrane targeting only when the peptide was of similar length and disordered. overall, this study reveals the role played by intrinsic disorder in lipoprotein sorting, providing mechanistic insight into the biogenesis of these proteins and suggesting that evolution can select for intrinsic disorder that supports protein function. introduction the cell envelope is the morphological hallmark of escherichia coli and other gram-negative bacteria. it is composed of the inner membrane, a classical phospholipid bilayer, as well as the outer membrane, an asymmetric bilayer with phospholipids in the inner leaflet and lipopolysaccharides in the outer leaflet . this lipid asymmetry enables the outer membrane to function as a barrier that effectively prevents the diffusion of toxic compounds in the environment into the cell. the inner and outer membranes are separated by the periplasm, a viscous compartment that contains a thin layer of peptidoglycan also known as the cell wall . the cell envelope is essential for growth and survival, as illustrated by the fact that several antibiotics such as the b-lactams target mechanisms of envelope assembly. mechanisms involved in envelope biogenesis and maintenance are therefore attractive targets for novel antibacterial strategies. approximately one-third of e. coli proteins are targeted to the envelope, either as soluble proteins present in the periplasm or as proteins inserted in one of the two membranes . while inner membrane proteins cross the lipid bilayer via one or more hydrophobic α-helices, proteins inserted in the outer membrane generally adopt a β-barrel conformation . another important group of envelope proteins is the lipoproteins, which are globular proteins anchored to one of the two membranes by a lipid moiety. lipoproteins carry out a variety of important functions in the cell envelope: they participate in the biogenesis of the outer membrane by inserting lipopolysaccharide molecules , and b-barrel proteins , they function as stress sensors triggering signal transduction cascades when envelope integrity is altered , and they control processes that are important for virulence . the diverse roles played by lipoproteins in the cell envelope has drawn a lot of attention lately, revealing how crucial these proteins are in a wide range of vital processes and identifying them as attractive targets for antibiotic development. yet, a detailed understanding of the mechanisms involved in lipoprotein maturation and trafficking is still missing. lipoproteins are synthesized in the cytoplasm as precursors with an n-terminal signal peptide . the last four c-terminal residues of this signal peptide, known as the lipobox, function as a molecular determinant of lipid modification unique to bacteria; only the cysteine at the last position of the lipobox is strictly conserved . after secretion of the lipoprotein into the periplasm, the thiol side-chain of the cysteine is first modified with a diacylglyceryl moiety by prolipoprotein diacylglyceryl transferase (lgt) (extended data fig. a, step ). then, signal peptidase ii (lspa) catalyzes cleavage of the signal peptide n-terminally of the lipidated cysteine before apolipoprotein n-acyltransferase (lnt) adds a third acyl group to the n-terminal amino group of the cysteine (extended data fig. a, steps - ). most mature lipoproteins are then transported to the outer membrane by the lol system. lol consists of lolcde, an abc transporter that extracts lipoproteins from the inner membrane and transfers them to the soluble periplasmic chaperone lola (extended data fig. a, steps - ) . lola escorts lipoproteins across the periplasm, binding their hydrophobic lipid tail, and delivers them to the outer membrane lipoprotein lolb (extended data fig. a, step ). lolb finally anchors lipoproteins to the inner leaflet of the outer membrane using a mechanism that remains poorly characterized (extended data fig. a, step ). in most gram-negative bacteria, a few lipoproteins remain in the inner membrane , . the current view is that inner membrane retention depends on the identity of the two residues located immediately downstream of the n-terminal cysteine on which the lipid moiety is attached ; this sequence, two amino acids in length, is known as the lol sorting signal. when lipoproteins have an aspartate at position + and an aspartate, glutamate, or glutamine at position + , they remain in the inner membrane , , possibly because strong electrostatic interactions between the + aspartate and membrane phospholipids prevent their interaction with lolcde . however, this model is largely based on data obtained in e. coli and variations have been described in other bacteria. for instance, in the pathogen pseudomonas aeruginosa, an aspartate is rarely found at position + and inner membrane retention appears to be determined by residues + and + , . surprisingly, lipoproteins are well sorted in p. aeruginosa cells expressing the e. coli lolcde complex , despite their different lol sorting signal. this result cannot be explained by the current model of lipoprotein sorting, underscoring that our comprehension of the precise mechanism that governs the triage of lipoproteins remains incomplete. excitingly, more unresolved questions regarding lipoprotein biogenesis have recently been raised. first, it was reported that a lola-lolb-independent trafficking route to the outer membrane exists in e. coli , but the factors involved have remained unknown. second, although lipoproteins have traditionally been considered to be exposed to the periplasm in e. coli and many other bacterial models , a series of investigations have started to challenge this view by identifying lipoproteins on the surface of e. coli, vibrio cholerae, and salmonella typhimurium - . overall, the field is beginning to explore a lipoprotein topological landscape that is more complex than previously assumed and raising intriguing questions about the signals that control surface targeting and exposure. here, stimulated by the hypothesis that crucial details of the mechanisms underlying lipoprotein maturation remained to be elucidated, we sought to identify novel molecular determinants controlling lipoprotein biogenesis. first, we systematically analyzed the sequence of the lipoproteins with validated localization encoded by the e. coli k genome and found that half of the outer membrane lipoproteins display a long and intrinsically disordered linker at their n-terminus. intrigued by these unstructured segments, we then probed their importance for the biogenesis of rcsf, nlpd, and pal, three structurally and functionally unrelated outer membrane lipoproteins. unexpectedly, we found that deleting the linker—while keeping the lol sorting signal intact—altered the targeting of all three lipoproteins to the outer membrane, with physiological consequences. focusing on rcsf, we determined that both the length and disordered character of the linker were important. remarkably, lowering the load of the lol system by deleting lpp, which encodes the most abundant lipoprotein (~ million copies per cell ), restored normal outer membrane targeting of linker-less rcsf, indicating that the n- terminal linker is required for optimal lipoprotein processing by lol. taken together, these observations reveal the unsuspected role played by protein intrinsic disorder in lipoprotein biogenesis. results half of e. coli lipoproteins present long disordered segments at their n-termini in an attempt to discover novel molecular determinants controlling the biogenesis of lipoproteins, we decided to systematically analyze the sequence of the lipoproteins encoded by the e. coli genome (strain mg ) in search of unidentified structural features. e. coli encodes ~ validated lipoproteins , of which have been experimentally shown to localize in the outer membrane . comparative modeling of existing x-ray, cryogenic electron microscopy (cryo-em), and nuclear magnetic resonance (nmr) structures revealed that approximately half of these outer membrane lipoproteins display a long segment (> residues) that is predicted to be disordered at the n-terminus (fig. , extended data fig. , extended data table ). in contrast, only one of the lipoproteins that remain in the inner membrane (dcrb; extended data fig. , extended data table ) had a long, disordered linker, suggesting that disordered peptides may be important for lipoprotein sorting. deleting the n-terminal linker of rcsf, nlpd, and pal perturbs their targeting to the outer membrane intrigued by the presence of these n-terminal disordered segments in so many outer membrane lipoproteins, we decided to investigate their functional importance. we selected three structurally unrelated lipoproteins whose function could easily be assessed: the stress sensor rcsf (which triggers the rcs signaling cascade when damage occurs in the envelope ), nlpd (which activates the periplasmic n-acetylmuramyl-l-alanine amidase amic, which is involved in peptidoglycan cleavage during cell division , ), and the peptidoglycan-binding lipoprotein pal (which is important for outer membrane constriction during cell division ). we began by preparing truncated versions of rcsf, nlpd, and pal devoid of their n-terminal unstructured linkers (extended data fig. b, extended data fig. ; rcsf∆ - , pal∆ - , and nlpd∆ - ). note that the lipidated cysteine residue (+ ) and the lol sorting signal (the amino acids at positions + and + ) were not altered in rcsf∆ - , pal∆ - , and nlpd∆ - , nor in any of the constructs discussed below (extended data table ). for pal, although the unstructured linker spans residues - (fig. ), we used pal∆ - because pal∆ - was either degraded or not detected by the antibody (data not shown). we first tested whether the truncated lipoproteins were still correctly extracted from the inner membrane and transported to the outer membrane. the membrane fraction was prepared from cells expressing the three variants independently, and the outer and inner membranes were separated using sucrose density gradients (methods). whereas wild-type rcsf, nlpd, and pal were mostly detected (> %) in the outer membrane fraction, as expected, ~ % of rcsf∆ - and ~ % of nlpd∆ - were retained in the inner membrane (fig. a, b). the sorting of pal was also affected, although to a lesser extent: % of pal∆ - was retained in the inner membrane (fig. c). notably, the expression levels of the three linker-less variants were similar (nlpd∆ - ) or lower (rcsf∆ - ; pal∆ - ) than those of the wild-type proteins (extended data fig. ), indicating that accumulation in the inner membrane did not result from increased protein abundance. we then tested the impact of linker deletion on the function of these three proteins. in cells expressing rcsf∆ - , the rcs system was constitutively turned on (fig. d); when rcsf accumulates in the inner membrane, it becomes available for interaction with igaa, its downstream rcs partner in the inner membrane , . likewise, expression of nlpd∆ - did not rescue the chaining phenotype (fig. e) exhibited by cells lacking both nlpd and envc, an activator of the amidases amia and amib . finally, pal∆ - partially rescued the sensitivity of the pal mutant to sds-edta that results from increased membrane permeability (fig. f). however, this observation needs to be considered with caution given that pal∆ - seemed to be expressed at lower levels than wild-type pal (extended data fig. ). thus, preventing normal targeting of rcsf, nlpd and pal to the outer membrane had functional consequences. rcsf variants with unstructured artificial linkers of similar lengths are normally targeted to the outer membrane the results above were surprising because they revealed that the normal targeting of rcsf, nlpd, and pal to the outer membrane does not only require an appropriate lol sorting signal, as proposed by the current model for lipoprotein sorting , but also the presence of an n-terminal linker. we selected rcsf, whose accumulation in the inner membrane can be easily tracked by monitoring rcs activity , , to investigate the structural features of the linker controlling lipoprotein maturation; keeping as little as % of the total pool of rcsf molecules in the inner membrane is sufficient to fully activate rcs . we first tested whether changing the sequence of the n-terminal segment while preserving its disordered character still yielded normal targeting of the protein to the outer membrane. to that end, we prepared an rcsf variant in which the n-terminal linker was replaced by an artificial, unstructured sequence (extended data table , extended data fig. , extended data fig. ) of similar length and consisting mostly of gs repeats (rcsfgs). substituting the wild-type linker with this artificial sequence was remarkably well tolerated by rcsf: rcsfgs was targeted normally to the outer membrane (fig. a) and did not constitutively activate the stress system (fig. b). thus, although rcsfgs has an n-terminus with a completely different primary structure, it behaved like the wild-type protein. we then investigated whether the n-terminal linker required a minimal length for proper targeting and function. we therefore constructed two rcsf variants with shorter, unstructured, artificial linkers (rcsfgs and rcsfgs , with linkers of and residues, respectively; extended data table , extended data fig. , extended data fig. ). importantly, rcsfgs and, to a greater extent, rcsfgs did not properly localize to the outer membrane: the shorter the linker, the more rcsf remained in the inner membrane (fig. a). consistent with the amount of rcsfgs and rcsfgs retained in the inner membrane, rcs activation levels were inversely related to linker length (fig. b). the disordered character of the linker is required for normal targeting taken together, the results above demonstrated that the rcsf linker can be replaced with an artificial sequence lacking secondary structure, provided that it is of appropriate length. next, we sought to directly probe the importance of having a disordered linker by replacing the rcsf linker with an alpha-helical segment amino acids long from the periplasmic chaperone fkpa (rcsffkpa; extended data table , extended data fig. , extended data fig. ). introducing order at the n-terminus of rcsf dramatically impacted the protein distribution between the two membranes: rcsffkpa was substantially retained in the inner membrane (fig. c) and constitutively activated rcs (fig. d). as alpha-helical segments are considerably shorter than unstructured sequences containing a similar number of amino acids, we also prepared an rcsf variant (rcsfcol) with a longer alpha helix from the helical segment of colicin ia, which is amino acids in length and also predicted to remain folded in the rcsfcol construct (extended data table , extended data fig. , extended data fig. ). however, doubling the size of the helix had no impact, with rcsfcol behaving similarly to rcsffkpa (fig. c, d). together, these data demonstrate that having an n-terminal disordered linker downstream of the lol sorting signal is required to correctly target rcsf to the outer membrane. the length of the linker is important, but the sequence is not, on the condition that the linker does not fold into a defined secondary structure. the disordered linker is required for optimal processing by lol our finding that n-terminal disordered linkers function as molecular determinants of the targeting of lipoproteins to the outer membrane raised the question of whether these linkers work in a lol-dependent or lol-independent manner. to address this mechanistic question, we tested the impact of deleting lpp on the targeting of rcsf∆ - . the lipoprotein lpp, also known as the braun lipoprotein, covalently tethers the outer membrane to the peptidoglycan and controls the size of the periplasm , . being expressed at ~ million copies per cell , lpp is numerically the most abundant protein in e. coli. thus, by deleting lpp, we considerably decreased the load on the lol system by removing its most abundant substrate. remarkably, lpp deletion fully rescued the targeting of rcsf∆ - to the outer membrane (fig. a), indicating that the linker functions in a lol-dependent manner and suggesting that accumulation of rcsf∆ - in the inner membrane results from a decreased ability of the lol system to process the linker-less rcsf variant. importantly, similar results were obtained with nlpd∆ - , which was also correctly targeted to the outer membrane in cells lacking lpp (fig. a). pal∆ - could not be tested because membrane fractionation failed with lpp pal double mutant cells whether or not they expressed pal∆ - (data not shown). to obtain further insights into the mechanism at play here, we next monitored whether linker deletion impacted the transfer of rcsf from lola to lolb in vitro. lola with a c-terminal his- tag was expressed in the periplasm of cells expressing wild-type rcsf or rcsf∆ - and purified to near homogeneity via affinity chromatography (methods; extended data fig. ). both rcsf and rcsf∆ - were detected in immunoblots of the fractions containing purified lola (extended data fig. ), indicating that both proteins form a soluble complex with lola and confirming that they use this chaperone for transport across the periplasm. lolb was expressed as a soluble protein in the cytoplasm and purified by taking advantage of a c-terminal strep- tag; lolb was then incubated with lola-rcsf or lola-rcsf∆ - and pulled-down using streptactin beads (methods). as both rcsf and rcsf∆ - were detected in the lolb-containing pulled-down fractions (fig. b), we conclude that both proteins were transferred from lola to lolb. thus, the linker is not required for the transfer of rcsf from lola to lolb. finally, we focused on the lolcde abc transporter in charge of extracting outer membrane lipoproteins and transferring them to lola. over-expression (extended data fig. a) of all components of this complex failed to rescue normal targeting of rcsf∆ - to the outer membrane (extended data fig. b). likewise, over-expressing the enzymes involved in lipoprotein maturation (lgt, lspa, and lnt; fig. ) had no impact on membrane targeting (extended data fig. a, b). thus, taken together, our results suggest that retention of rcsf∆ - in the inner membrane does not result from the impairment of a specific step, but rather from less efficient processing of the truncated lipoprotein by the entire lipoprotein maturation pathway (see discussion). discussion lipoproteins are crucial for essential cellular processes such as envelope assembly and virulence. however, despite their functional importance and their potential as targets for new antibacterial therapies, we only have a vague understanding of the molecular factors that control their biogenesis. by discovering the role played by n-terminal disordered linkers in lipoprotein sorting, this study adds an important new layer to our comprehension of lipoprotein biogenesis in gram-negative bacteria. critically, it also indicates that the current model of lipoprotein sorting—that sorting between the two membranes is controlled by the or residues that are adjacent to the lipidated cysteine —needs to be revised. lipoproteins with unstructured linkers at their n-terminus are commonly found in gram-negative bacteria including many pathogens (see below); further work will be required to determine whether these linkers control lipoprotein targeting in organisms other than e. coli, laying the foundation for designing new antibiotics. it was previously shown that both lola and lolb (but not lolcde) can be deleted under specific conditions , suggesting at least one alternate route for the transport of lipoproteins across the periplasm and their delivery to the outer membrane. during this investigation, we envisaged the possibility that the linker could be required to transport lipoproteins via a yet-to-be- identified pathway independent of lola/lolb. however, our observations that both rcsf and rcsf∆ - were found in complex with lola (extended data fig. ) and were transferred by lola to lolb (fig. b) does not support this hypothesis. instead, our data clearly indicate that lipoproteins with n-terminal linkers still depend on the lol system for extraction from the inner membrane and transport to the outer membrane (extended data fig. a); they also suggest that n-terminal linkers improve lipoprotein processing by lol (see below). we note that two of the lipoproteins under investigation here, pal and rcsf, have been reported to be surface-exposed , , . a topology model has been proposed to explain how rcsf reaches the surface: the lipid moiety of rcsf is anchored in the outer leaflet of the outer membrane while the n-terminal linker is exposed on the cell surface before being threaded through the lumen of b-barrel proteins . thus, in this topology, the linker allows rcsf to cross the outer membrane. it is therefore tempting to speculate that n-terminal disordered linkers may be used by lipoproteins as a structural device to cross the outer membrane and reach the cell surface. it is worth noting that n-terminal linkers are commonly found in lipoproteins expressed by the pathogens borrelia burgdorferi and neisseria meningitides , , ; lipoprotein surface exposure is common in these pathogens. in addition, the accumulation of rcsf∆ - in the inner membrane (fig. a) also suggests that lol may be using n-terminal linkers to recognize lipoproteins destined to the cell surface before their extraction from the inner membrane in order to optimize their targeting to the machinery exporting them to their final destination (bam in the case of rcsf , , ). investigating whether a dedicated lol-dependent route exists for surface-exposed lipoproteins will be the subject of future research. our work also delivers crucial insights into the functional importance of disordered segments in proteins in general. most proteins are thought to present portions that are intrinsically disordered. for instance, it is estimated that - % of eukaryotic proteins contain regions that do not adopt a defined secondary structure in vitro . however, demonstrating that these unstructured regions are functionally important in vivo is challenging. by showing that an n- terminal disordered segment downstream of the lol signal is required for the correct sorting of lipoproteins, our work provides direct evidence that evolution has selected intrinsic disorder by function. in conclusion, the data reported here establish that the triage of lipoproteins between the inner and outer membranes is not solely controlled by the lol sorting signal; additional molecular determinants, such as protein intrinsic disorder, are also involved. our data further highlight the previously unrecognized heterogeneity of the important lipoprotein family and call for a careful evaluation of the maturation pathways of these lipoproteins. data availability all data generated or analysed during this study are included in this published article and its supplementary information file. references . silhavy, t.j., kahne, d. & walker, s. the bacterial cell envelope. cold spring harb perspect biol , a ( ). . weiner, j.h. & li, l. proteome of the escherichia coli envelope and technological challenges in membrane proteome analysis. biochim biophys acta , - ( ). . ricci, d.p. & silhavy, t.j. outer membrane protein insertion by the β-barrel assembly machine. ecosal plus ( ). . chimalakonda, g. et al. lipoprotein lpte is required for the assembly of lptd by the beta-barrel assembly machine in the outer membrane of escherichia coli. proc natl acad sci u s a , - ( ). . sherman, d.j. et al. lipopolysaccharide is transported to the cell surface by a membrane-to-membrane protein bridge. science , - ( ). . malinverni, j.c. et al. yfio stabilizes the yaet complex and is essential for outer membrane protein assembly in escherichia coli. mol microbiol , - ( ). . laloux, g. & collet, j.f. "major tom to ground control: how lipoproteins communicate extra-cytoplasmic stress to the decision center of the cell". j bacteriol ( ). . kovacs-simon, a., titball, r.w. & michell, s.l. lipoproteins of bacterial pathogens. infect immun , - ( ). . szewczyk, j. & collet, j.f. the journey of lipoproteins through the cell: one birthplace, multiple destinations. adv microb physiol , - ( ). . babu, m.m. et al. a database of bacterial lipoproteins (dolop) with functional assignments to predicted lipoproteins. j bacteriol , - ( ). . narita, s.i. & tokuda, h. bacterial lipoproteins; biogenesis, sorting and quality control. biochim biophys acta mol cell biol lipids , - ( ). . horler, r.s., butcher, a., papangelopoulos, n., ashton, p.d. & thomas, g.h. echolocation: an in silico analysis of the subcellular locations of escherichia coli proteins and comparison with experimentally derived locations. bioinformatics , - ( ). . tokuda, h. biogenesis of outer membranes in gram-negative bacteria. biosci biotechnol biochem , - ( ). . tokuda, h. & matsuyama, s. sorting of lipoproteins to the outer membrane in e. coli. biochim biophys acta , in - ( ). . gennity, j.m. & inouye, m. the protein sequence responsible for lipoprotein membrane localization in escherichia coli exhibits remarkable specificity. j biol chem , - ( ). . terada, m., kuroda, t., matsuyama, s.i. & tokuda, h. lipoprotein sorting signals evaluated as the lola-dependent release of lipoproteins from the cytoplasmic membrane of escherichia coli. j biol chem , - ( ). . hara, t., matsuyama, s. & tokuda, h. mechanism underlying the inner membrane retention of escherichia coli lipoproteins caused by lol avoidance signals. j biol chem , - ( ). . narita, s. & tokuda, h. amino acids at positions and determine the membrane specificity of pseudomonas aeruginosa lipoproteins. j biol chem , - ( ). . lewenza, s., mhlanga, m.m. & pugsley, a.p. novel inner membrane retention signals in pseudomonas aeruginosa lipoproteins. j bacteriol , - ( ). . lorenz, c., dougherty, t.j. & lory, s. correct sorting of lipoproteins into the inner and outer membranes of pseudomonas aeruginosa by the escherichia coli lolcde transport system. mbio ( ). . grabowicz, m. & silhavy, t.j. redefining the essential trafficking pathway for outer membrane lipoproteins. proc natl acad sci u s a , - ( ). . konovalova, a. & silhavy, t.j. outer membrane lipoprotein biogenesis: lol is not the end. philos trans r soc lond b biol sci ( ). . wilson, m.m. & bernstein, h.d. surface-exposed lipoproteins: an emerging secretion phenomenon in gram-negative bacteria. trends microbiol , - ( ). . zuckert, w.r. secretion of bacterial lipoproteins: through the cytoplasmic membrane, the periplasm and beyond. biochim biophys acta , - ( ). . pride, a.c., herrera, c.m., guan, z., giles, d.k. & trent, m.s. the outer surface lipoprotein vola mediates utilization of exogenous lipids by vibrio cholerae. mbio , e - ( ). . valguarnera, e., scott, n.e., azimzadeh, p. & feldman, m.f. surface exposure and packing of lipoproteins into outer membrane vesicles are coupled processes in bacteroides. msphere ( ). . sueki, a., stein, f., savitski, m.m., selkrig, j. & typas, a. systematic localization of escherichia coli membrane proteins. msystems ( ). . li, g.w., burkhardt, d., gross, c. & weissman, j.s. quantifying absolute protein synthesis rates reveals principles underlying allocation of cellular resources. cell , - ( ). . gonnet, p., rudd, k.e. & lisacek, f. fine-tuning the prediction of sequences cleaved by signal peptidase ii: a curated set of proven and predicted lipoproteins of escherichia coli k- . proteomics , - ( ). . cho, s.h. et al. detecting envelope stress by monitoring beta-barrel assembly. cell , - ( ). . heidrich, c. et al. involvement of n-acetylmuramyl-l-alanine amidases in cell separation and antibiotic-induced autolysis of escherichia coli. mol microbiol , - ( ). . uehara, t., parzych, k.r., dinh, t. & bernhardt, t.g. daughter cell separation is controlled by cytokinetic ring-activated cell wall hydrolysis. embo j , - ( ). . gerding, m.a., ogata, y., pecora, n.d., niki, h. & de boer, p.a. the trans-envelope tol-pal complex is part of the cell division machinery and required for proper outer- membrane invagination during cell constriction in e. coli. mol microbiol , - ( ). . hussein, n.a., cho, s.h., laloux, g., siam, r. & collet, j.f. distinct domains of escherichia coli igaa connect envelope stress sensing and down-regulation of the rcs phosphorelay across subcellular compartments. plos genet , e ( ). . tsang, m.j., yakhnina, a.a. & bernhardt, t.g. nlpd links cell wall remodeling and outer membrane invagination during cytokinesis in escherichia coli. plos genet , e ( ). . shrivastava, r., jiang, x. & chng, s.s. outer membrane lipid homeostasis via retrograde phospholipid transport in escherichia coli. mol microbiol , - ( ). . farris, c., sanowar, s., bader, m.w., pfuetzner, r. & miller, s.i. antimicrobial peptides activate the rcs regulon through the outer membrane lipoprotein rcsf. j bacteriol , - ( ). . cohen, e.j., ferreira, j.l., ladinsky, m.s., beeby, m. & hughes, k.t. nanoscale-length control of the flagellar driveshaft requires hitting the tethered outer membrane. science , - ( ). . asmar, a.t. et al. communication across the bacterial cell envelope depends on the size of the periplasm. plos biol , e ( ). . grabowicz, m. lipoprotein transport: greasing the machines of outer membrane biogenesis: re-examining lipoprotein transport mechanisms among diverse gram- negative bacteria while exploring new discoveries and questions. bioessays , e ( ). . michel, l.v. et al. dual orientation of the outer membrane lipoprotein pal in escherichia coli. microbiology , - ( ). . konovalova, a., perlman, d.h., cowles, c.e. & silhavy, t.j. transmembrane domain of surface-exposed outer membrane lipoprotein rcsf is threaded through the lumen of beta-barrel proteins. proc natl acad sci u s a , e - ( ). . brooks, c.l., arutyunova, e. & lemieux, m.j. the structure of lactoferrin-binding protein b from neisseria meningitidis suggests roles in iron acquisition and neutralization of host defences. acta crystallogr f struct biol commun , - ( ). . noinaj, n. et al. structural basis for iron piracy by pathogenic neisseria. nature , - ( ). . rodriguez-alonso, r. et al. structural insight into the formation of lipoprotein-beta- barrel complexes. nat chem biol , - ( ). . bardwell, j.c. & jakob, u. conditional disorder in chaperone action. trends biochem sci , - ( ). . majdalani, n., hernandez, d. & gottesman, s. regulation and mode of action of the second small rna activator of rpos translation, rpra. mol microbiol , - ( ). . baba, t. et al. construction of escherichia coli k- in-frame, single-gene knockout mutants: the keio collection. mol syst biol , ( ). . cherepanov, p.p. & wackernagel, w. gene disruption in escherichia coli: tcr and kmr cassettes with the option of flp-catalyzed excision of the antibiotic-resistance determinant. gene , - ( ). . gil, d. & bouche, j.p. cole -type vectors with fully repressible replication. gene , - ( ). . yu, d. et al. an efficient recombination system for chromosome engineering in escherichia coli. proc natl acad sci u s a , - ( ). . sklar, j.g. et al. lipoprotein smpa is a component of the yaet complex that assembles outer membrane proteins in escherichia coli. proc natl acad sci u s a , - ( ). . miller, j.c. experiments in molecular genetics, (cold spring harbor laboratory press, new york, ). . Šali, a. & blundell, t.l. comparative protein modelling by satisfaction of spatial restraints. journal of molecular biology , - ( ). . pettersen, e.f. et al. ucsf chimera - a visualization system for exploratory research and analysis. journal of computational chemistry , - ( ). . guzman, l.m., belin, d., carson, m.j. & beckwith, j. tight regulation, modulation, and high-level expression by vectors containing the arabinose pbad promoter. j bacteriol , - ( ). acknowledgments we thank asma boujtat for technical help. we are indebted to the members of the collet laboratory and to nassos typas (embl, heidelberg) for helpful suggestions and discussions and to tom silhavy (princeton) for providing bacterial strains. j.s. was a research fellow of the fria and j.f.c. is an investigator of the frfs-welbio. this work was funded by the welbio, by grants from the f.r.s.-fnrs, from the fédération wallonie-bruxelles (arc / - ), from the european commission via the international training network train target ( ), and from the eos excellence in research program of the fwo and frs-fnrs (g g n). author contributions j.-f.c., j.e.r., j.s., and s.h.c. designed and performed the experiments. j.e.r., j.s., and s.h.c. constructed the strains and cloned the constructs. j.-f.c., j.e.r., j.s., s.h.c., and a.m. analyzed and interpreted the data. b.i.i. performed the structural analysis. j.-f.c., j.e.r., and j.s. wrote the manuscript. all authors discussed the results and commented on the manuscript. figure legends figure . structural analysis of lipoproteins reveals that half of outer membrane lipoproteins display an intrinsically disordered linker at the n-terminus. structures were generated via comparative modeling (methods). x-ray and cryo-em structures are green, nmr structures are cyan, and structures built via comparative modeling from the closest analog in the same pfam group are orange. in all cases, the n-terminal linker is magenta. lipoproteins targeting the outer membrane: pal, osme, nlpe, nlpc, mltb, nlpi, mltc, rcsf, yaji, ycfl, ybay, rlpa, nlpd, ycal. the remaining lipoproteins are shown in extended data figure . figure . the n-terminal linker displayed by lipoproteins is important for outer membrane targeting. a, b, c. the outer membrane (om) and inner membrane (im) were separated via centrifugation in a three-step sucrose density gradient (methods). while (c) rcsfwt, (d) nlpdwt, and (e) palwt were found predominantly in the om, rcsf∆ - , nlpd∆ - , and pal∆ - were substantially retained in the im. data are presented as the ratio of signal intensity in a single fraction to the total intensity in all fractions. all variants were expressed from plasmids (extended data table ). dsbd and lpp were used as controls for the om and im, respectively. d. the rcs system is constitutively active when rcsf’s linker is missing. rcs activity was measured with a beta-galactosidase assay in a strain harboring a transcriptional rpra::lacz fusion (methods). results were normalized to expression levels of rcsf variants (mean ± standard deviation; n = biologically independent experiments) e. phase-contrast images of the envc::kan ∆nlpd mutant complemented with nlpdwt or nlpd∆ - . nlpd∆ - only partially rescues the chaining phenotype of the envc::kan ∆nlpd double mutant. scale bar, µm. f. expression of pal∆ - does not rescue the sensitivity of the pal::kan mutant to sds-edta. cells were grown in lb medium at °c until od = . . tenfold serial dilutions were made in lb, plated onto lb agar or lb agar supplemented with . % sds and . mm edta, and incubated at °c. images in a, b, c, e, and f are representative of biological triplicates. graphs in a, b, and c were created by spline analysis of curves representing a mean of three independent experiments. figure . the length and the disordered character of the rcsf linker play key roles in rcsf targeting to the outer membrane. a. the outer membrane (om) and inner membrane (im) were separated via centrifugation in a three-step sucrose density gradient (methods). dsbd and lpp were used as controls for the om and im, respectively. the longer the linker, the more protein was correctly translocated to the im. bar graphs denote mean ± standard deviation of n = biologically independent experiments. images are representative of experiments and immunoblots performed in biological triplicate. b. rcs activity was measured with a beta-galactosidase assay in a strain harboring a transcriptional rpra::lacz fusion (methods). results were normalized to expression levels of rcsf variants (mean ± standard deviation of n = biologically independent experiments). rcs activity relates to the quantity of rcsf retained in the inner membrane. c. rcsf mutants harboring alpha helical linkers (rcsffkpa and rcsfcol) were subjected to two consecutive centrifugations in sucrose density gradients (methods). both mutants were inefficiently translocated from the im to the om (mean ± standard deviation of n = biologically independent experiments). images are representative of experiments and immunoblots performed in biological triplicate. d. the rcs system was constitutively active in rcsffkpa and rcsfcol strains; activation levels were comparable to those of rcsf∆ - . rcs activity was measured as in b. results were normalized as in b. figure . n-terminal disordered linkers interact with the lol system to target lipoproteins to the outer membrane. a. deleting lpp rescues normal targeting of rcsf∆ - and nlpd∆ - to the outer membrane. the outer and inner membranes were separated via centrifugation in a sucrose density gradient (methods). whereas rcsf∆ - and nlpd∆ - accumulate in the inner membrane of cells expressing lpp, the most abundant lol substrate, they are normally targeted to the outer membrane in cells lacking lpp (mean ± standard deviation of n = biologically independent experiments). b. in vitro pull-down experiments show that rcsfwt and rcsf∆ - are transferred from lola to lolb. lola-rcsfwt and lola- rcsf∆ - complexes were obtained by lola-his affinity chromatography followed by size exclusion chromatography (methods). each complex was incubated with lolb-strep that was previously purified via strep-tactin affinity chromatography (methods). both rcsf variants were eluted in complex with lolb- strep, while lola was only present in the flow through. i, input; ft, flow through; e, eluate. figures figure figure figure figure methods bacterial growth conditions bacterial strains used in this study are listed in extended data table . bacterial cells were cultured in luria broth (lb) at °c unless stated otherwise. the following antibiotics were added when appropriate: spectinomycin ( µg/ml), ampicillin ( µg/ml), chloramphenicol ( µg/ml), and kanamycin ( µg/ml). l-arabinose ( . %) and isopropyl- β-d-thiogalactoside (iptg) were used for induction when appropriate. bacterial strains and plasmids dh (a derivative of escherichia coli mg carrying a chromosomal rpra::lacz fusion at the λ attachment site ) was used as wildtype throughout the study. all deletion mutants were obtained by transferring the corresponding alleles from the keio collection (kanr) into dh via p phage transduction. deletions were verified by pcr and the absence of the protein was verified via immunoblotting (when possible). if necessary, the kanamycin cassette was removed via site-specific recombination mediated by the yeast flp recombinase with pcp vector . all strains expressing the rcsf mutants used for subcellular fractionation lacked rcsb in order to prevent induction of rcs. the plasmids used in this study are listed in extended data table and the primers appear in extended data table . rcsf, pal, and nlpd were expressed from the low-copy vector pam containing the sc origin of replication and the lac promoter. to produce psc for rcsf expression, rcsf (including approximately base pairs upstream of the coding sequence) was amplified by pcr from the chromosome of dh (primer pair sh_rcsf(psti)- r and sh_rcsfu-r (kpni)-f). the amplification product was digested with kpni and psti and inserted into pam , resulting in psc . nlpd was amplified using primers jr and jr and pal was amplified with primers js and js . amplification products were digested with psti-xbai and kpni-xbai, respectively, generating pjr (for nlpd expression) and pjs (for pal expression). to clone rcsfΔ - , the nucleotides encoding the rcsf signal sequence were amplified using primers sh_rcsfur(kpni)_f and sh_rcsfss-fsg (ncoi)_r, and those encoding the rcsf signaling domain were amplified using primers sh_rcsfss-fsg (ncoi)_r and sh_rcsf(psti)_r. in both cases, psc was used as template. then, overlapping pcr was performed using sh_rcsfur(kpni)_f and sh_rcsf(psti)_r from the two pcr products previously obtained. the final product was digested with kpni and psti, and ligated with pam pre-digested with the same enzymes, yielding psc . to add a gs linker (ser-gly- ser-gly-ser-gly-ala-met) into psc , the primers sh_gs linker_f and sh_gs linker_r were mixed, boiled, annealed at room temperature, and ligated with psc pre-digested with ncoi, generating psc . psc was generated similarly, but using primers sh_sg linker_f and sh_sg linker_r and plasmid psc . psc was generated using primers sh_da linker_f and sh_sg linker_r and plasmid psc . the pal allele lacking the linker region (palΔ - ) was created via overlapping pcr. the pjs plasmid served as template for pcr with the m r/m f external primers and js /js internal primers. the truncated allele was cloned into pam at the same restriction sites as the full-length allele, producing pjs . the nlpd allele lacking the linker regions (nlpdΔ - ) was created via overlapping pcr. e. coli chromosomal nlpd served as template for the pcr, with jr /jr as external primers and jr /jr as internal primers. the truncated allele was then cloned into pam at the same restriction sites as the full-length allele, producing pjr . rcsffkpa and rcsfcol were obtained by inserting dna sequences corresponding to helical linker fragments (fkpa ser -glu and colicin ia ile -lys ) into rcsfΔ - at ncoi and rsrii restriction sites. the fkpa gene fragment was amplified from the e. coli mc chromosome (js /js primers) and the cia gene fragment was chemically synthetized as a gene block by integrated dna technologies (idt). the resulting plasmids were pjs and pjs , respectively. pam does not contain the laciq repressor. therefore, to enable expression- level regulation by iptg, strains containing the pam plasmids expressing rcsf variants were co-transformed with pet b, a high-copy plasmid from a different incompatibility group (pbr origin of replication; novagen) containing the laciq repressor. chromosomal insertion of rcsfΔ - was performed via λ-red recombineering with psim -tet plasmid (a gift of d. hughes). in the first step, the cat-sacb cassette was introduced and later replaced by mutant rcsf. the chromosomal lolcde operon was amplified via pcr using primers js and js (adding a c-terminal his-tag to lole) and then inserted into pbad using the restriction sites psti and xbai, resulting in pjr . the expression level of lole-his was verified via immunoblotting. the sequence encoding lolb without its n-terminal cysteine was first amplified from the chromosome via pcr using primers jr /pl (adding a c-terminal strep-tag). it was then cloned into pet a using the restriction sites xbai and psti. lola was amplified using chromosomal lola as pcr template for primers jr /jr (jr contains the sequence of a his-tag) and then cloned into pbad using kpni and xbai, resulting in pjr . the genes encoding lgt and lnt were amplified from the chromosome with pcr primers ag /ag and ag /jr , respectively. ag and jr also encode a myc-tag. pcr products were cloned into pam using kpni and psti. expression levels were verified via immunoblotting (data not shown). lspa was amplified with pcr primers jr /jr . the pcr product was cloned into psc , a modified pam with a ribosome binding site and a c- terminal flag tag, using ncoi and bamhi. expression of lspa-flag was induced by adding µm iptg. expression levels were verified with immunoblots (data not shown). cell fractionation and sucrose density gradients cell fractionation was performed as described previously with some modifications. four hundred milliliters of cells were grown until the optical density at nm (od ) of the culture reached . . cells were harvested via centrifugation at , x g at °c for min, washed with te buffer ( mm tris-hcl ph , mm edta), and resuspended in ml of the same buffer. the washing step was skipped with the dlpp strains to prevent the loss of outer membrane vesicles. dnase i ( mg; roche), mg rnase a (thermo scientific), and a half tablet of a protease inhibitor cocktail (complete edta-free protease inhibitor cocktail tablets; roche) were added to cell suspensions, and cells were passed through a french pressure cell at , psi. after adding mgcl to a final concentration of mm, the lysate was centrifuged at , x g at °c for min in order to remove cell debris. then, ml of supernatant were placed on top of a two-step sucrose gradient ( . ml of . m sucrose in mm hepes ph . and . ml of . m sucrose in mm hepes ph . ). the samples were centrifuged at , x g for h at °c in a . ti beckman rotor. after centrifugation, the soluble fraction and the membrane fraction were collected. the membrane fraction was diluted four times with mm hepes ph . . to separate the inner and the outer membranes, ml of the diluted membrane fraction were loaded on top of a second sucrose gradient ( . ml of . m sucrose, . ml of . m sucrose, ml of . m sucrose, always in mm hepes ph . ). the samples were then centrifuged at , x g for h at °c in a sw beckman rotor. approximately fractions of . ml were collected and odd-numbered fractions were subjected to sds-page, transferred onto a nitrocellulose membrane, and probed with specific antibodies. graphs were created in graphpad prism via spline analysis of the curves representing a mean of three independent experiments. immunoblotting protein samples were separated via % or - % sds-page (life technologies) and transferred onto nitrocellulose membranes (ge healthcare life sciences). the membranes were blocked with % skim milk in mm tris-hcl ph . , . m nacl, and . % tween (tbs-t). tbs-t was used in all subsequent immunoblotting steps. the primary antibodies were diluted , to , times in % skim milk in tbs-t and incubated with the membrane for h at room temperature. the anti-rcsf, anti-dsbd, anti-lpp, anti-nlpd, anti-lola, and anti-lolb antisera were generated by our lab. anti-pal was a gift from r. lloubès, and anti-his is a peroxidase-conjugated antibody (qiagen). the membranes were incubated for h at room temperature with horseradish peroxidase-conjugated goat anti-rabbit igg (sigma) at a : , dilution. labelled proteins were detected via enhanced chemiluminescence (pierce ecl western blotting substrate, thermo scientific) and visualized using x-ray film (fuji) or a camera (image quant las and vilber fusion solo s). in order to quantify proteins levels, band intensities were measured using imagej version . r (national institutes of health). β-galactosidase assay β-galactosidase activity was measured as described previously . graphs representing a mean of six experiments with standard deviation were prepared in graphpad prism. expression-level estimations were performed as follows. cultures used for β-galactosidase activity ( . ml per culture) were precipitated with % trichloroacetic acid, washed with ice-cold acetone, and resuspended in . ml laemmli sds sample buffer. samples ( µl) were subjected to sds- page and immunoblotted with anti-rcsf antibody. sds-edta sensitivity assay cells were grown in lb at °c until they reached an od of . . tenfold serial dilutions were made in lb and plated on lb agar supplemented with spectinomycin ( µg/ml) when necessary. plates were incubated at °c. to evaluate the sensitivity of the pal mutant, plates were supplemented with . % sds and . mm edta. microscopy image acquisition cells were grown in lb at °c until od = . . cells growing in exponential phase were spotted onto a % agarose phosphate-buffered saline pad for imaging. cells were imaged on a nikon eclipse ti -e inverted fluorescence microscope with a cfi plan apochromat dm lambda x oil, n.a. . , w.d. . mm objective. images were collected on a prime b mm camera (photometrics). we used a cy - c ( mm) filter cube (nikon). image acquisition was performed with nis-element advance research version . . protein purification jr cells were grown in lb supplemented with kanamycin ( µg/ml) at °c. when the culture od = . , the expression of cytoplasmic lolb-strep was induced with mm iptg. cells ( l) were pelleted when they reached od = and resuspended in ml of buffer a ( mm nacl and mm napi, ph ) containing one tablet of complete edta-free protease inhibitor cocktail (roche). cells were lysed via two passages through a french pressure cell at , psi. the lysate was centrifuged at , x g for min at °c in a ja rotor and the supernatant was mixed with strep-tactin resin (iba lifesciences) previously equilibrated with buffer a. after washing the resin with column volumes of buffer a, lolb-strep was eluted with column volumes of buffer a supplemented with mm desthiobiotin. lolb-strep was finally desalted using a pd column (ge healthcare). soluble lola-rcsfwt and loa-rcsfΔ - complexes were purified via affinity chromatography as follows. cells co-expressing lola either with wild-type rcsf (jr ) or rcsfΔ - (jr ) were grown in lb at °c supplemented with µg/ml ampicillin until od = . . protein expression was then induced with . % arabinose. cells ( l) were pelleted at od = and resuspended in ml of buffer a containing one tablet of protease inhibitor cocktail. cells were lysed via two passages through a french pressure cell at , psi. the lysate was centrifuged at , x g for min at °c using a . ti beckman rotor. to obtain the soluble fraction, the supernatant was centrifuged at , x g for h at °c using the same rotor. the supernatant was added to a his trap hp column (merck) previously equilibrated with buffer a. the column was washed with column volumes of buffer a supplemented with mm imidazole and lola-his was eluted using a gradient of imidazole (from mm to mm). the fractions obtained were analyzed via sds-page; lola was detected around kda (data not shown). rcsf variants were detected via immunoblotting with an anti-rcsf antibody. fractions containing lola-rcsf variants were pooled, concentrated to ml using a vivaspin turbo concentrator (cut-off kda; sartorius), and purified via size- exclusion chromatography with a superdex s - / column (ge healthcare). pull down and transfer of rcsf variants from lola to lolb lolb-strep was incubated at °c for min under agitation with lola-rcsfwt or with lola- rcsfΔ - (lola-rcsfwt and lola-rcsfΔ - complexes were purified as described above). the mixture was added to magnetic strep beads (magstrep type beads, iba life science) previously equilibrated with buffer a and incubated for min at °c on a roller. after washing the beads with the same buffer, lolb-strep was eluted with buffer a supplemented with mm biotin. samples were analyzed via sds-page and lola and lolb were detected with coomassie brilliant blue (bio-rad). rcsf was detected via immunoblotting with an anti-rcsf antibody. structural analysis of lipoproteins when x-ray, cryo-em, or nmr structures were available, the missing residues were completed through comparative modeling using modeller version . . if no structure of the lipoprotein was available, then the most pertinent analogous structure from proteins belonging to the same pfam group was used as template for comparative modeling. the linker was defined as the unstructured fragment from the n-terminal cys of the mature form until the first residue with well-defined secondary structure (α-helix or β-strand) belonging to a globular domain. short, intermediate, and long linkers had lengths of < , - , and > residues, respectively. images were generated using ucsf chimera version . . . legends for figures in the extended data extended data figure . lipoprotein maturation and sorting in the e. coli cell envelope. a. after processing by lgt (step ), lspa (step ), and lnt (step ), a new lipoprotein either remains in the inner membrane or is extracted by the lolcde complex (step ), depending on the residues at position + and + . lolcde transfers the lipoprotein to the periplasmic chaperone lola (step ), which delivers the lipoprotein to lolb (step ). lolb, a lipoprotein itself, inserts the lipoprotein in the outer membrane using a poorly understood mechanism (step ). b. schematic of lipoprotein structural domains. the n-terminal signal sequence targets the lipoprotein to the cell envelope; the last four amino acid residues of the signal sequence form the lipobox. the last residue of the lipobox is the invariant cysteine that undergoes lipidation. this cysteine, which is the first residue of the mature lipoprotein, is directly followed by the sorting signal, a sequence of or amino acids that controls the sorting of mature lipoproteins between the inner and outer membranes. the c-terminal portion of a mature lipoprotein is a globular domain. an intrinsically disordered linker separates the sorting signal from the globular domain in about half of e. coli lipoproteins (fig. ; extended data fig. ; extended data table ). the lengths of the deleted disordered linkers of the unrelated lipoproteins rcsf, pal, and nlpd are indicated. lp, lipoprotein. extended data figure . structural analysis of lipoproteins reveals that half of outer membrane lipoproteins display an intrinsically disordered linker at the n-terminus. structures were generated via comparative modeling. x-ray and cryo-em structures are green, nmr structures are cyan, and structures built via comparative modeling from the closest analog in the same pfam group are orange. in all cases, the n-terminal linker is magenta. lipoproteins targeting the outer membrane: amid, bamb, bamc, hslj, mlta, loip, lpob, blc, bame, csgg, emta, gfce, bamd, lpoa, lolb, lpte, mlaa, mlic, yddw, yedd, yghg, yfey, ybjp, yiad, ybhc, pqic, yger, yfib, yrap. lipoproteins targeting the im: dcrb, metq, nlpa, ycjn, yehr, apbe. synthetic constructs: rcsfgs, rcsfgs , rcsfgs , rcsf∆ - , rcsffkpa, rcsfcol, nlpd∆ - , pal∆ - . extended data figure . expression levels of rcsf∆ - , pal∆ - , and nlpd∆ - . cells were grown at °c in lb until od = . and precipitated with trichloroacetic acid (methods). immunoblots were performed with a-rcsf, a-nlpd, and a-pal antibodies (methods). all images are representative of three independent experiments. extended data figure . schematic of rcsf variants used in this study and their distributions in the outer membrane (om) and inner membrane (im). rcsfgs, rcsfgs , and rcsfgs have linkers that are disordered and mostly consist of gs repeats. the linker of rcsfgs is the same length as the linker of rcsfwt. rcsfgs and rcsfgs are shorter than rcsfwt. regions of rcsffkpa and rcsfcol fold into alpha helices borrowed from the sequences of fkpa and colicin ia, respectively. extended data figure . complexes between lola and rcsfwt or rcsf∆ - can be purified. both rcsfwt (a) and rcsf∆ - (b) were eluted in complex with lola-his via affinity chromatography followed by size exclusion chromatography. gel filtration was performed with a superdex s - / column. samples were analyzed via sds-page and proteins, including lola-his, were stained with coomassie brilliant blue (methods). rcsf variants were detected by immunoblotting fractions with a-rcsf antibodies. images are representative of three independent experiments. extended data figure . overexpression of lol cde does not restore targeting of rcsf∆ - . a. expression level of lolcde-his. cells were grown in lb plus . % arabinose at °c until od = . (methods). membrane and soluble fractions were separated with a sucrose density gradient (methods). lole-his was detected in the membrane fraction by immunoblotting with a-his (methods). images are representative of three independent experiments. b. the outer membrane (om) and inner membrane (im) were separated with a sucrose density gradient. expression of lolcde did not rescue om targeting of rcsf∆ - . images are representative of experiments performed in biological triplicate. extended data figure . overexpressing lgt, lspa, and lnt does not rescue the targeting of rcsf∆ - to the outer membrane. a. expression levels of lgt, lspa, and lnt. cells were grown in lb (plus µm iptg for cells expressing lspa) at °c until od = . (methods). outer membrane (om) and inner membrane (im) were separated with a sucrose density gradient (methods). lgt-myc and lnt- myc were detected in the im via immunoblotting with a-myc. lspa-flag was detected in the im with a-flag. b. cells overexpressing lgt, lspa, or lnt were exposed to a sucrose density gradient (methods). rcsf∆ - was retained in the im in all conditions. images are representative of three independent experiments. extended data figures extended data figure extended data figure extended data figure extended data figure extended data figure extended data figure extended data figure extended data tables extended data table : list of the verified lipoproteins of e. coli used for the structural analysis in this study. attached excel sheet extended data table : rcsf mutants used in this study and the amino acid sequences of their corresponding n-terminal linkers. the acylated cysteine is the first residue listed. rcsf linkers amino acid sequence rcsfwt csmlsrspvepvqstapqpkaepakpkapratpv rcsfΔ - csmgpv rcsfgs csmslfdapamsgsgsgamsgsgsgampv rcsfgs csmsgsgsgamsgsgsgampv rcsfgs csmsgsgsgampv rcsffkpa csmgsdqeieqtlqafearvkssaqakmekdaadnepv rcsfcol csmgildtrlseleknggaalavldaqqarllgqqtrndraisearnkl ssvteslntarnaltraeqqltqqkpv extended data table : e. coli strains used in this study. strains genotype and description source dh rpra-lacz mg (argf-lac) u keio collection single mutants rcsf::kan, rcsb::kan, pal::kan, nlpd::kan, envc::kan xl -blue enda gyra (nalr) thi- reca rela lac glnv f’ [::tn proab+ laciq d(lacz)m ] hsdr (rk- mk+) stratagene bl f- ompt hsdsb (rb- mb-) gal dcm (de ) novagen js dh drcsf pam this study js dh drcsf pjs this study js dh drcsf rcsb::kan pet b this study js js pjs this study js dh pal::kan this study js js pjs this study js js pjs this study js dh drcsf pjs this study js js pjs this study js dh drcsf psc this study js dh drcsf psc this study js js psc this study js js psc this study js js psc this study js js psc this study js js psc this study js dh drcsf psc this study js dh drcsf psc this study js dh drcsf psc this study js drcsb lpp::kan rcsf::rcsfd - this study jr nlpd::kan this study jr jr pjr this study jr jr pjr this study jr dh pam this study jr bl rcsf::kan this study jr jr pet -cytoplasmic lolb-strep this study jr rcsb::kan rcsf::rcsfd - this study jr dnlpd this study jr dnlpd envc::kan this study jr jr pjr this study jr jr pjr this study jr jr pam this study jr jr pag this study jr jr pjr this study jr jr pbad this study jr jr pjr this study jr jr pjr this study jr jr lpp::kan this study jr jr pjr this study jr js pam this study jr jr psc this study jr rcsb::kan rcsf::rcsfd - pjr this study jr rcsb::kan pjr this study jr rcsb::kan rcsf::rcsfd - pbad this study jr rcsb::kan pbad this study extended data table : plasmids used in this study. plasmids features source pam iptg-regulated plac, psc -based, spectinomycin (no laciq) pbad arabinose inducible pbad, ampicillin pbad arabinose inducible pbad, chloramphenicol pet a iptg regulated t promoter, kanamycin novagen pet b iptg regulated t promoter, ampicillin novagen pcp flp+, l ci +, l pr repts, ampicillin, chloramphenicol psim -tet psc plasmid, repats, tetra, l-red (gram-beta-exo), ci , tetracycline gift from d. hughes pjs pam rcsffkpa fkpa linker (s -e ) this study pjs pam palwt this study pjs pam pald - this study pjs pam rcsfcol colicin ia linker (i -k ) this study psc pam rcsfgs (c s m s gsgsgamg) this study psc pam rcsfgs (c s m s gsgsgamsgsgsgam g) this study psc pam rcsfgs (c s m s lfdapamsgsgsgam sgsgsgamg) this study psc pam rcsfd - (c s m g p ) this study psc pam rcsfwt this study pjr pam nlpdwt this study pjr pam nlpdd - (c s d a ) this study pjr pbad lola- xhis this study pjr pet cytoplasmic lolb-strep this study pjr pbad lolcde- xhis this study pjr pam lnt-myc this study pjr psc lspa-flag this study psc pam , iptg-regulated plac , laciq, triple flag tag this study pag pam lgt-myc this study extended data table : primers used in this study. primer sequence ’ to ’ js _fkpalinker _fw acatccatggggtccgaccaagagatcgaac js _fkpalinker _rv atgtcggaccggttcgttatcagccgcgtc js _pal_- b cgtcttccggcaactgatgg js _pal_+ b ttggtgcctgagcaaaagcg js _pal_fw acatggtaccttaattgaatagtaaaggaatc js _pal_rv atgttctagattagtaaaccagtaccgcac js _palnolink er_overlappcr_ fw tgttcttccaaccaggctcgtctgcaaatg js _palnolink er_overlappcr_ rv cagacgagcctggttggaagaacatgccgc js _lolcdehi s_fw acattctagatctttgctacagcaaccagac js _lolcde_ his_rv atgtctgcagttagtgatggtgatggtgatgaccctggccgctaaggactcg js _lred_catsa cbin_rcsf_fw tcctgattcaatattgacgttttgatcatacattgaggaaatactaaaatgagacgttgatcgg cacg js _lred_catsa cbin_rcsf_rev tatagggcgagcgaataacgcctatttgctcgaactggaaactgcatcaaagggaaaactgt cca js _lred_rcsf _catsacbout_fw tcctgattcaatattgacgttttgatcatacattgaggaaatactatgcgtgctttaccgatctg tt js _lred_rcsf _catsacbout_rv tatagggcgagcgaataacgcctatttgctcgaactggaaactgctcatttcgccgtaatgtt aagc js _junction lr ed_rcsfup_fw gcggagctgttaaaggctg js _junction lr ed_rcsfdown_rv gagcaatgagatgcagttcg js _junction lr ed_cat-out_rv cgggcaagaatgtgaataaagg js _junction lr ed_sacb-out_fw gctgtacctcaagcgaaagg m r caggaaacagctatgaccatg m f tgtaaaacgacggccagt pl _rcsf_- b cgctttttaccagacctggc pl _rcsf_+ atatcattcaggacgggcgcttgccc pl _rcsb_- b acatctgattcgtgagaagg pl _rcsb+ b taatgggaatcgtaggccgg pl _fw_lpp_- caatttttttatctaaaacccagcg pl _rv_lpp_+ ccagagcaagggaatatgttacgcg sh_da linker_f catgagcttattcgacgcgccggc sh_da linker_r catggccggcgcgtcgaataagct sh_rcsf(psti)_r gagactgcagtcatttcgccgtaatgttaag sh_rcsfur(kpn i)_f gagggtacccgttttgatcatacattg rcsfss-fsg (ncoi)_f gcggctgttccatggggccggtccgaatttatac rcsfss-fsg (ncoi)_r ggaccggccccatggaacagccgcttagcatgag sh_gs linker_f catgagtggctctggatctggtgc sh_gs linker_r catggcaccagatccagagccact jr _nlpd_fw gagatctagattattaaccaatttttcctgggggataa jr _nlpd_rv agagctgcagttatcgctgcggcaaataacgca jr _nlpdoverlap _fw ggctggcaggctgttctgacgcgcagcaaccgcaaattca jr _nlpdoverlap _rv tgaatttgcggttgctgcgcgtcagaacagcctgccagcc jr _fw_nlpd- caggtcagcgtatcgtgaacatc jr _rv_nlpd+ tcatttaaatcatgaactttcagcg jr _fw_lola_- _pbad acatggtacccgggagtgacgtaatttgaggaat jr _rev_lola_ his_pbad atgttctagattaatgatgatgatgatgatgctcgagcttacgttgatcatctacc gtgac jr _rev_cytopl asmic_lolb_nost op_streptag_stop ccaactcgagtcacttttcgaactgcgggtggctccagcttgcttt cactatccagttatccat jr -fw-- - envc gttgtcgctg atgggta jr -rev- + envc aatcatcaatgacgatggca jr -rev-lnt- myctag-psti aaaaactgcagctacaggtcttcttcgctaatcagtttctgttcgcttgctttacgtcgctg acgcagac jr -fw-ncoi- lspa gagaccatgggtagtcaatcgatctgttcaac jr -rev-lspa- no stop-bamhi gagaggatccttgttttttcgctctag ag _lgt_- _fw_kpni aaaaaggtaccttcaatcgctgttctctttc ag _lnt_- _fw_kpni aaaaaggtaccaccccagccgaagctggatg ag _lgt_myc ct_psti aaaaactgcagctacaggtcttcttcgctaatcagtttctgttcgcttgcggaaacgtgtt gctgtgggc pl - lolbwoss-fw- ncoi acacccatggccgttaccacgcccaaagg colicinialinker_ geneblock acatccatggggattctggacacgcggttgtcagagctggaaaaaaatg gcggggcagcccttgccgttcttgatgcacaacaggcccgtctgc tcgggcagcagacacggaatgacagggccatttcagaggcacgg aataaactcagttcagtgacggaatcgcttaacacggcccgtaat gcattaaccagagctgaacaacagctgacgcaacagaaagcggtccg acat boosting detection of low abundance proteins in thermal proteome profiling experiments by addition of an isobaric trigger channel to tmt multiplexes boosting detection of low abundance proteins in thermal proteome profiling experiments by addition of an isobaric trigger channel to tmt multiplexes sarah a. peck justice†, neil a. mccracken, josé f. victorino‡, aruna b. wijeratne, amber l. mosley* department of biochemistry and molecular biology, indiana university school of medicine, indianapolis, indiana , united states abstract: the study of low abundance proteins is a challenge to discovery-based proteomics. mass-spectrometry (ms) applica- tions, such as thermal proteome profiling (tpp) face specific challenges in detection of the whole proteome as a consequence of the use of nondenaturing extraction buffers. tpp is a powerful method for the study of protein thermal stability, but quantitative accuracy is highly dependent on consistent detection. therefore, tpp can be limited in its amenability to study low abundance proteins that tend to have stochastic or poor detection by ms. to address this challenge, we incorporated an affinity purified protein complex sample at submolar concentrations as an isobaric trigger channel into a mutant tpp (mtpp) workflow to provide reproducible detec- tion and quantitation of the low abundance subunits of the cleavage and polyadenylation factor (cpf) complex. the inclusion of an isobaric protein complex trigger channel increased detection an average of x for previously detected subunits and facilitated detec- tion of cpf subunits that were previously below the limit of detection. importantly, these gains in cpf detection did not cause large changes in melt temperature (tm) calculations for other unrelated proteins in the samples, with a high positive correlation between tm estimates in samples with and without isobaric trigger channel addition. overall, the incorporation of affinity purified protein complex as an isobaric trigger channel within a tmt multiplex for mtpp experiments is an effective and reproducible way to gather thermal profiling data on proteins that are not readily detected using the original tpp or mtpp protocols. proteins are the functional units of a cell, carrying out and controlling processes at specific times and locations to maintain homeostasis and respond to external stimuli. as a consequence of functional changes, proteins can exist in a variety of biophysical states within cells as a consequence of variants in their primary sequence, post-translational modification (ptm) state, and/or subcellular localization. in many cases a protein’s biophysical state is impacted by associations with other proteins, including both transient and stable protein-protein inter- actions. the characterization of protein-protein interactions (ppis) is fundamental to gaining a full understanding of biological mechanism. in fact, ppis are so critical to proper protein function that disruptions in these interactions often lead to disease and/or cell death . advances in mass spectrometry (ms)-based proteomics workflows continue to increase our ability to study protein complex dynamics and ppis - . ms-based approaches for protein interaction analysis rely on discov- ery-based proteomics performed using data-dependent acquisition (dda). generally in dda, peptides with the most intense ions from ms are selected for fragmentation and ms analysis . this approach maximizes signal to noise levels and thereby increases confidence in the selection and subsequent identification of the peptide ions. challenges with the use of dda include selection of peptide ions from protein(s) of interest that are present at low relative abundance levels or when peptides of interest (such as ptm containing peptides) are pre- sent at low relative levels to their unmodified counterparts. low abun- dance peptides may be present at insufficient ms signal intensity lev- els to trigger fragmentation and ms analysis based on instrument set- tings for ms analysis. while fractionation and an extended hplc gra- dient help to spread out the elution of peptides into the mass spectrom- eter, many peptides may still co-elute such that highly abundant ion species will outcompete those that are less abundant . a number of strategies have recently emerged to improve ms detection of low abun- dance proteins and post-translational modifications (ptms) for a vari- ety of applications including single cell proteomics - . although we will not discuss all of the recently established strategies here, one such strategy, boosting to amplify signal with isobaric labeling (basil), has similarities that have informed the current work. specifically, basil has been shown to successfully increase detection of low abun- dance phosphopeptides through addition of a boosting sample to a tan- dem mass tag (tmt)-based multiplex . tmtpro labeling allows for the multiplexing and relative quantitation of up to samples - . as each tmt label is isobaric, labeled peptides from the multiplexed sam- ples elute into the mass spectrometer together and are analyzed simul- taneously as one ion peak during ms scans which is distinguished in fragment ion scans during msn (typically ms or ms ) analysis. by incorporating a phospho-enriched sample into a single channel in the tmt multiplex, yi et al increased ion abundance of phosphopeptides in the ms scan to the extent that ms was triggered for phosphopep- tides that were typically below the level of detection in standard dda approaches . basil allowed for the identification and quantification of phosphopeptides in other tmt channels where enrichment had not been performed . the basil method has since been optimized for de- tection of phosphopeptides in single cells and similar approaches have been applied to phosphotyrosine-containing peptides , silac- labeled peptides , and using synthetic peptides to particular peptides of interest . basil and other similar methods that take advantage of isobaric carrier channels could have numerous applications in dda- based quantitative workflows. the challenges to studying low abundance proteins in dda proteomics experiments extend in particular to the mass spectrometry-based ther- mal proteome profiling (tpp) methods and are the focus of this study. tpp analysis takes advantage of tmt labeling technology to produce protein melt curves that can then be compared across conditions to measure alterations in protein thermal stability , . although tpp was originally developed to study drug and ligand binding, it has been shown to also be a robust approach to probe ppis in a number of dif- ferent applications (recently reviewed by mateus et al ). we recently developed a new application of tpp referred to as mutant tpp (mtpp), that is used to study the effects of protein missense mutations on the proteome at large with the ability to focus in on specific protein com- plexes and their ppis . mtpp analysis is advantageous to other .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / methods for the study of ppis in that it does not require antibodies, addition of reagents such as crosslinkers, or the genetic manipulations (such as the production of fusion proteins) typically necessary for many other ppi analyses. additionally, mtpp can be performed with signif- icantly less starting material than traditional affinity purification or en- richment approaches, making it applicable to a wider variety of sample types. despite these advantages, we have quickly encountered chal- lenges associated with quantitative analysis of specific target proteins and their interaction partners. therefore, a strategy for increasing the ion intensity of proteins of interest in mtpp experiments would have a significant impact on our ability to study ppi dynamics of low abun- dance protein complexes while still retaining the context of changes within the overall proteome. one advantage of tmt- and itraq- based multiplexed workflows for global proteomics studies is that the pooling of multiple samples generates increased protein starting mate- rial that can then be subjected to extensive biochemical fractionation to facilitate deep proteome coverage - . this advantage can be coupled with protein extraction methods using denaturants such as urea or sds to isolate the full proteome of many cells and tissues . the workflow for tpp cannot exploit these advantages since: ) temperature treat- ment of lysates for tpp results in unequal levels of protein mixture across the multiplex that, in our hands, vary on average at least -fold from the lowest to the highest temperature treatment ; and ) non- denaturing protein extraction buffers must be used to maintain protein structure, ppis, and protein interactions with other molecules (includ- ing but not limited to lipids, metabolites, small molecules, and drugs) - . as a consequence, tpp workflows typically result in decreased proteome coverage relative to denaturant extracted proteomes even when equivalent amounts of starting material are used . to expand proteome coverage for our mtpp workflow, we have devel- oped a basil-like approach to increase the signal of low abundance protein complexes and their representative peptides in mtpp experi- ments using a protein complex affinity purification trigger channel in place of the phosphopeptides isobaric boosting channel used in basil . as a proof-of-concept, we investigated the ability of this ap- proach to enhance detection of the relatively low abundance protein complex cleavage and polyadenylation factor (cpf) complex in a mtpp workflow. affinity purified cpf that we have previously char- acterized - was incorporated as an isobaric trigger channel into our mtpp workflow at a ratio to the lowest heat-treated mtpp sample of ~ : and ~ : . using this approach, a significant increase in the abun- dance of cpf complex members was observed, including those that were not readily identified without the isobaric trigger channel. im- portantly, addition of an isobaric trigger channel into our mtpp work- flow does not appear to have a significant impact on the melt tempera- ture (tm) calculation of proteins detected both with and without the trigger. overall, the use of an isobaric trigger channel is a robust ap- proach to prioritize dda selection of proteins or peptides of interest such as missense mutant containing proteins and their interaction part- ners, which are of particular focus within mtpp experiments. experimental section yeast strains and growth all experiments were performed in saccharomyces cerevisiae. the pa- rental strain smy , described previously, was obtained from the mirkin lab and used in the trigger experiments comparing technical replicates. for the biological replicate experiments, the wildtype strain used was the commercially available by strain (open biosys- tems). the ssu - temperature sensitive mutant (first described by the hampsey lab ) was purchased from euroscarf. the pta -flag strain was made via homologous recombination. the xflag tag dna se- quence was amplified from plasmids obtained from funakoshi and hochstrasser to insert the flag epitope tag into the genome at the ’-end of the pta gene in wt (by ). successful incorporation of the flag tag was confirmed via western blot. for mtpp experiments, cells were inoculated at an od = . and grown to an od = . in yeast extract, peptone, dextrose (ypd) me- dium at permissive temperature ( °c or °c). ypd was removed by filtration through a nitrocellulose membrane (millipore, burlington, ma). cells were flash frozen with liquid nitrogen and stored at - °c to be used in subsequent sample preparation steps. for affinity purifi- cation of cpf via pta -flag, cells were grown overnight at °c in ypd to an od ~ . cells were pelleted, washed, and transferred to ml conical tubes for storage at - ° until subsequent sample prepa- ration steps. sample preparation by and ssu - samples for mtpp were prepared as described in peck justice et al with the exception of an extended temperature range for the heat treatment. for the no trigger mtpp experiments, lysate was treated at the following ten temperatures: untreated, °, °, . °, . °, . °, . °, . °, . °, and . °c. a tmt plex kit (thermo scientific, waltham, ma) with channels tmt ; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c and tmt were respectively used to label peptide solutions derived from untreated, °, °, . °, . °, . °, . °, . °, . °, and . °c temperature treatments in wt. in ssu - , channels tmt ; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c and tmt were respectively used to label peptide solutions derived from untreated, °, °, . °, . °, . °, . °c, . °, . °, and . ° temperature treatments. tmt labeling steps were performed ac- cording to manufacturer provided instructions. to boost detection of the native cpf subunits, subsequent mtpp rep- licates of wt and ssu - included the addition of a trigger channel consisting of an affinity-purified cpf complexes. affinity purification of cpf via pta -flag was performed as described previously for ssu -flag purifications . the pta -flag affinity purified sample was added at a ratio of . ug trigger to ug of the lowest heat- treated sample ( : ratio) for the initial study. the untreated samples were removed from the multiplex from no trigger samples to accom- modate for the isobaric trigger channel to be labeled with tmt . the remainder of the channels, tmt n; tmt c; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c and tmt were used to label peptide solutions derived from °, °, . °, . °, . °, . °, . °, . °, and . °c temperature treat- ments. subsequent sample preparation steps were performed as de- scribed in peck justice et al . smy samples for independent replicate experiments were prepared as described in peck justice et al . lysate was treated at the following eight temperatures: °, °, . °, . °, . °, . °, . °, and . °c. a tmt plex kit (thermo scientific, waltham, ma) with channels tmt n; tmt c; tmt n; tmt c; tmt n; tmt c; tmt n; tmt c were respectively used to label pep- tide solutions derived from °, °, . °, . °, . °, . °, . °, and . °c temperature treatments in parental culture samples. note that some channels in the plex were used for other samples not de- scribed in this report. these heat-treated lysates were analyzed twice and as separate lc-ms experiments for comparison of technical repli- cate reproducibility. in one experiment, the set of combined labeled samples was analyzed with a ninth trigger channel (tmt ) at a ratio of ug total isobaric trigger channel protein to ug of the lowest heat- treated sample ( : ratio) which included the pta -flag affinity pu- rified material (described previously) while in the second experiment, the trigger was not added. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / lc-ms/ms analysis following multiplex preparation as described above, samples were sub- jected to high-ph reversed phase fractionation as previously described . nanolc-ms/ms analyses were performed on an orbitrap fusion lumos mass spectrometer (thermo scientific, waltham, ma) coupled to an easy-nlc hplc (thermo scientific, waltham, ma). one-third of the resuspended fractions were loaded onto an in-house prepared re- versed phase column using bar as applied maximum pressure to an easy-nano cm column with µm reversed phase resin. the peptides were eluted using a -minute gradient increasing from % buffer a ( . % formic acid in water) and % buffer b ( . % formic acid in ac- etonitrile) to % buffer b at a flow rate of nl/min. the peptides were eluted using a - minute gradient increasing from % buffer a ( . % formic acid in water) and % buffer b ( . % formic acid in acetonitrile) to % buffer b at a flow rate of nl/min. nano-lc mobile phase was introduced into the mass spectrometer using a nan- ospray source (thermo scientific, waltham, ma). during peptide elu- tion, the heated capillary temperature was kept at °c and ion spray voltage was kept at . kv. the mass spectrometer method was oper- ated in positive ion mode for minutes having a cycle time of sec- onds for ms/ms acquisition. ms data was acquired using a data-de- pendent acquisition using a top speed method following the first survey ms scan. during ms , using a wide quadrupole isolation, survey scans were obtained with an orbitrap resolution of k with vendor defined parameters―m/z scan range, - ; maximum injection time, ; agc target, e ; micro scans, ; rf lens (%), ; “datatype”, pro- file; polarity, positive with no source fragmentation and to include charge states to for fragmentation. dynamic exclusion for fragmen- tation was kept at seconds. during ms , the following vendor de- fined parameters were assigned to isolate and fragment the selected precursor ions. isolation mode = quadrupole; isolation offset = off; isolation window = . ; multi-notch isolation = false; scan range mode = auto normal; firstmass = ; activation type = cid; col- lision energy (%) = ; activation time = ms; activation q = . ; multistage activation = false; detector type = iontrap; ion trap scan rate = turbo; maximum injection time = ms; agc target = e ; microscans = ; datatype = centroid. during ms , daughter ions se- lected from neutral losses (e.g. h o or nh ) of precursor ion cid dur- ing ms were subjected to further fragmentation using higher-energy c-trap dissociation (hcd) to obtain tmt reporter ions and peptide specific fragment ions using following vendor defined parameters. iso- lation mode = quadrupole; isolation window = ; multi-notch isola- tion = true; ms isolation window (m/z) = ; number of notches = ; collision energy (%) = ; orbitrap resolution = k; scan range (m/z) = - ; maximum injection time = ms; agc target = e ; datatype = centroid. the data were recorded using thermo sci- entific xcalibur ( . . . ) software (copyright thermo fisher scientific inc.). protein identification and quantification resulting raw files were analyzed using proteome discoverertm . (thermo scientific, waltham, ma). the sequest ht search engine was used to search against a yeast protein database from the uniprot sequence database (december ) containing , yeast protein and common contaminant sequences (fasta file used available on prote- omexchange under accession pxd ). specific search parame- ters used were: trypsin as the proteolytic enzyme, peptides with a max of two missed cleavages, precursor mass tolerance of ppm, and a fragment mass tolerance of . da. static modifications used for the search were, ) carbamidomethylation on cysteine residues; ) tmtsixplex label on lysine (k) residues and the n-termini of peptides. dynamic modifications used for the search were oxidation of methio- nine and acetylation of n-termini. percolator false discovery rate was set to a strict setting of . . values from both unique and razor pep- tides were used for quantification. no normalization setting was used for protein quantification since the different temperature treatments are expected to have different protein amounts. the mass spectrometry proteomics data have been deposited to the proteomexchange consor- tium via the pride partner repository with the dataset identifier pxd and doi: . /pxd . data analysis venn diagrams were created using venny . . dot plots, scatter plots, and waterfall plots were created using ggplot in r studio (r studio for mac, version . . ). bar graphs were created in excel (microsoft excel for mac, version . ). the tpp package (v . . ) in r studio was used to generate normalized melt curves and to determine protein melt temperatures as described previously . resulting data processing and analysis also occurred in r studio. change in tm (Δtm) values were calculated by taking wt tm -ssu - tm, thereby limiting calculations to proteins detected in both wt and mutant. further parsing was accomplished by limiting our data to melt curves with r values > . and then by proteins that were detected in at least two of the three replicates. proteins were ranked according to median change in tm and ordered from the largest change (proteins that were destabilized in the mutant) to smallest change (proteins that were stabilized in the mutant). changes in tm that were outside of ± 𝝈 (𝝈 being the standard deviation), were considered statistically significant, and identified as proteins destabilized or stabilized due to the mutations in ssu . results and discussion addition of an affinity purified isobaric trigger channel to mtpp multiplexes does not cause large changes in peptide coverage or quantitation figure . workflow overview for mtpp with isobaric trigger channel addition. equal amounts of protein from each lysate for every biologi- cal replicate sample were subjected to different temperature treatments: °, °, . °, . °, . °, . °, . °, . °, and . °c, to in- duce protein denaturation. the soluble fractions from each treatment as well as a pta -flag affinity purification sample were digested in-solu- tion with trypsin/lys-c. resulting peptides were labeled with isobaric mass tags (tmt plex) as shown and mixed prior to mass spectrometry (ms) analysis. resulting ms/ms data were analyzed using proteome dis- coverertm . to identify and quantify abundance levels of peptides for each temperature treatment and each biological replicate across geno- types. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we hypothesized that incorporation of a well-characterized affinity pu- rified sample isolated from our system of interest as an isobaric trigger channel would increase ms ion intensity of peptides of interest within the tmt multiplex. as a consequence, the identification of peptides from the affinity purified protein complex would boost the identifica- tion in the remaining experimental mtpp channels used for melt curve production and subsequent tm calculation when comparing different experimental samples. similar to the approach used in basil , the incorporation of an affinity purified cpf complex purified from our system of interest has numerous potential advantages including native levels of cpf post-translational modifications and protein interaction partners. similar to mtpp, the affinity purifications for the cpf com- plex were performed using non-denaturing buffers to preserve ppis. qualitatively, the ms/ms fragment data for cpf complexes will be improved from inclusion of the isobaric trigger channel increasing the ion abundance of the fragments and therefore the probability of cpf identification at the peptide spectrum match (psm) level. from a quan- titative perspective, tmt information will be obtained during data processing but will be excluded for interpretation of the mtpp melt curves for each protein. pta - xflag affinity purifications were digested with lysc/trypsin and labeled with tmt for inclusion within the mtpp multiplex. mtpp quantitative analysis and curve generation was performed using the remaining channels as described in the methods (fig. ). the mtpp samples were subjected to eight or nine different temperatures ( °, °, . °, . °, . °, . °, . °, . °, and . °c) and then cen- trifuged to separate soluble and insoluble material as previously de- scribed . for samples with eight temperature points no . ° treatment sample was included. samples were then processed and subjected to lc-ms/ms analysis using an ms -based fragmentation and tmt quantitation workflow (fig. ). using sequest ht and proteome discoverer . for qualitative and quantitative analysis, between , and , proteins were detected and quantified depending on the rep- licate (supp. tab. ). replicates are designated as preparation , , (hence p , p , p ). the p replicate had less ids overall but p and p had very similar peptide detection levels (supp. tab. ). to gain in- sights into general trends with the quantitative data, dot plots were gen- erated to show the abundance value for each quantified protein (fig. ). consistent with previous mtpp experiments , there was an overall de- crease in protein abundance as the temperature at which the sample was treated increased. importantly, incorporation of a protein complex iso- baric trigger channel into the multiplex did not alter the overall trend of decreasing protein abundance with increased temperature (figure b&d) or have a significant effect on the number of proteins detected. the average ion abundance at each temperature treatment also re- mained consistent between samples plus or minus the isobaric trigger channel (compare figure a to b and c to d). finally, the average quantitative ratio of the isobaric trigger channel to the mtpp experi- mental sample processed at °c remains consistent at a : (figure b) or : (figure d) mirroring the ratios used for mixing of the mul- tiplex. the impact of the trigger on mtpp analysis was investigated using both technical replicates and biological replicates so that we could evaluate differences in our workflow and their impact on qualitative and quan- titative parameters. for the technical replicates, the same labeled sam- ples were split into two tmt multiplexes; one multiplex without an isobaric cpf trigger (no trigger) and one multiplex with an isobaric cpf trigger labeled with tmt (trigger) with a quantitative ratio (based on protein assays) to lowest temperature treatment of ~ : . for the biological replicates, four biological replicate samples were grown and prepared independently of one another. one replicate contained a non-heat treated (untreated) sample that was labeled with tmt (no trigger sample) and the remaining three replicates were multiplexes with a cpf trigger labeled with tmt (trigger) with a trigger to low- est temperature treatment ratio of ~ : . while there was not an obvious effect on the overall abundance of pro- teins in the samples, it is possible that the trigger could affect the de- tection and identification of proteins by biasing the mass spectrometer towards proteins present in the affinity purification. comparisons of ms-based measurements across the technical replicates showed that the trigger channel incorporation did not have a significant impact on protein identification and quantification (fig. a). technical replicate analyses showed very similar numbers of detected psms, peptides, and proteins suggesting that the addition of the trigger channel at a ratio of : has little impact on overall lc-ms/ms detection (fig. a, yel- low). the biological replicates showed more variation across samples which is attributed to their separate processing for tpp in addition to variation that could occur from trypsin digestion and other processing steps , . trigger p in the biological replicate study did have overall lower levels of proteins detected but this was not likely a consequence of trigger channel addition considering that trigger p and trigger p samples had similar detection levels to the no trigger sample (fig. a, green). direct comparison of proteins quantified in the no trigger vs. trigger samples showed an % overlap in quantified proteins with unique proteins present in all individual datasets (figure b&c). over- all, these data suggest that the addition of an isobaric trigger channel figure . the use of an isobaric trigger channel does not alter mtpp experimental channel abundance values. dot plots of protein abundance values for each protein detected in wt cells in technical replicates without (a) and with (b) the isobaric trigger channel (trigger) addition and repre- sentative biological replicates without (c) and with (d) the isobaric trigger addition. the same general decrease of protein abundances with increase in temperature treatment is seen across all replicates. dot plots for addi- tional replicates are provided in supp. fig. . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / has little to no impact on overall proteome detection outside of the in- herent variability seen in independent sample processing (for the bio- logical replicates) and lc-ms/ms runs. a critical feature of mtpp analysis is the ability to accurately calculate melt temperature (tm) from the resulting melt curves. to ensure that incorporation of the trigger did not have major impacts on tm calcula- tion of proteins outside of the cpf complex, we performed pearson correlation analysis of the tms of proteins detected in both the no trig- ger and trigger samples (figure d, tm data from the tpp package in supp. tab. ). from these we can see a high degree of correlation of . between the no trigger and trigger samples for proteins which met the criteria for quantitation in our mtpp data analysis workflow (in- cluding the number of proteins with melt curves having an r greater than or equal to . ). additionally, even across biological replicates, there is a strong positive correlation of . between tm calculations in the no trigger vs. trigger samples (figure e, tm data from the tpp package in supp. tab. ). the ability to make comparisons using bio- logical replicate data would be beneficial in settings with limiting sam- ples where technical replicates may not be feasible in addition to their importance for rigorous statistical analysis. an isobaric trigger channel facilitates mtpp analysis of the cleav- age and polyadenylation factor complex cpf and its accessory factors cleavage factor ia and ib play major roles in rna processing. cpf is responsible for efficient and specific cleavage and polyadenylation of messenger rnas , and has been shown to have important roles in termination of rna polymerase ii transcription , . the cpf complex is currently described as having subunits (figure a) which provide the complex with numerous activ- ities including endonuclease, polyadenylation, and phosphatase func- tions . ssu , which is mutated in the ssu - yeast strain, is an inte- gral subunit of cpf (fig. a, indicated with a star). performing mtpp according to the established protocol resulted in limited detection of cpf (figure c-f, no trigger samples shown in dark/light gray). one notable exception to the low detection of cpf was the subunit glc . along with its presence in cpf, glc is also the catalytic subunit of pp and thereby functions in many other protein complexes in eukar- yotic cells (reviewed in , ) where it plays roles in cell cycle regulation and nutrient regulation , , . due to these many roles, glc has a higher global abundance than other cpf subunits and is thereby more readily detected. previously performed experiments found that the entire cpf complex copurifies with flag-tagged pta . in theory, addition of an affinity purified cpf sample to one channel of the tmt multiplex would in- crease the ms ion intensity of cpf subunits and would “trigger” the mass spectrometer to pick peptides from cpf complex subunits more often in a dda analysis than in samples that lack an isobaric trigger. we have previously shown that psm level detection of affinity purified protein complexes results in highly reproducible quantitation of protein complexes in label-free quantitation workflows , . this prior work found that rna polymerase ii complex digestions result in the gener- ation of a number of highly detectable peptides and it is likely that this would also be the case for cpf affinity purifications . if these findings hold true, there should be a significant overlap in unique peptide iden- tifications across the independent lc-ms/ms runs for biological rep- licates. as shown in fig. b, a significant overlap of unique peptides from cpf complex subunits were identified across the three biological replicates containing the isobaric cpf trigger (peptide data provided in supp. tab. ). due to the lower overall protein levels in the trigger p sample, a higher level of unique peptide overlap was also observed be- tween trigger p and p than was observed between p /p or p /p (fig. b). from an individual subunit perspective, incorporation of the isobaric pta -flag trigger channel significantly increased identifica- tion of most cpf subunits substantially (figure c-f, colored sam- ples). while similar levels of glc were detected across all samples, detection of other complex members was improved significantly in the presence of the isobaric cpf trigger channel. in fact, some cpf subu- nits that were previously not detected in no trigger samples (such as cft , cft , and pfs ) were detected by hundreds of psms by utilizing the isobaric cpf trigger channel (fig. c & d). the increased level of psm detection was accompanied by increased normalized ion abun- dance (fig. e & f). overall, this data supports that we can specifically increase reproducible detection and quantitation of proteins of interest for thermal profiling experiments using an isobaric affinity purified trigger channel. mutations in ssu - do not impact the thermal stability of the cpf protein complex the cpf complex contains two protein phosphatases, glc and ssu . ssu is an integral component of cpf and its function is required for proper termination and ’-end processing of rnas - . additionally, its interactions with tfiib have shown to be critical for the formation of gene loops, which regulate gene expression by linking transcription termination and initiation factors - . much of the characterization of ssu has been accomplished through studies using the ssu - mutant figure . dataset comparisons from isobaric trigger channel addition. a) summary of lc-ms/ms data in technical and biological replicates with and without isobaric trigger channel addition. venn diagrams com- paring quantified proteins in no trigger (gray) vs. trig- ger (yellow/green) in b) technical replicates and c) bi- ological replicate using trigger p . correlation plot of the calculated tms in no trigger vs. trigger in d) tech- nical replicates and e) biological replicates. the blue line represents the linear fit of the data. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / yeast strain , , , . the ssu - ts mutant contains a single mutation, r a, that confers temperature sensitivity at °c. this mutation im- pairs the catalytic activity of ssu , leading to a decrease in transcrip- tion elongation efficiency , and defects in gene looping , . whether the disrupted phosphatase function in the ssu - mutant af- fects the thermal stability of ssu or the cpf complex had not been previously examined. detection of cpf with and without the trigger channel resulted in sim- ilar numbers of cpf subunits psms in ssu - as in wt which facili- tates mtpp analysis of cpf complex thermal stability from a quantita- tive perspective (fig. c&d). protein melt curve analysis using the tpp r package (fig. a, mtpp result data in supp.tab. ) showed no obvious changes in any of the cpf subunits in ssu - relative to wt. using all biological replicate data, we can define statistically sig- nificant changes in protein thermal stability as any Δtm which falls at least two standard deviations above or below the average Δtm across the three ssu - replicates relative to wt. whole proteome analysis of Δtm using mtpp found statistically significant decreases in the ther- mal stability of proteins and increases in the thermal stability of proteins in ssu - cells (fig. b, supp. tab. ). go term analysis of proteins that had a significant change in thermal stability in ssu - showed a . -fold enrichment in proteins involved in nucleobase-con- taining compound biosynthetic process with a p-value of . e- . these results suggest that the defects in transcription caused by disrupted cat- alytic activity of ssu in this mutant strain are not due to impacts on the stability of ssu or cpf. however, secondary effects of ssu - functional disruption have been associated with changes in the nrd - nab -sen complex activity which impact a variety of processes in- cluding gtp production , , . the temperature sensitivity of this strain is instead likely to be a result of a need for efficient transcription at higher temperatures in order to respond to heat stress , . a deeper investigation into the proteins with changes in thermal stability will help to further elucidate the impacts of this catalytic mutant on gene expression. conclusions the integration of an isobaric affinity-purified protein complex trigger channel increased our ability to analyze the low abundance protein complex cpf via mtpp. our analysis did not observe major effects on the tm estimates of unrelated proteins present in the cell. protocols for affinity purification would need to be optimized for purity and speci- ficity for optimal use as an isobaric trigger channel. however, since protein complex digestion results in detection of a highly reproducible peptide population, a reasonable alternative approach could include use of a population of purified synthetic peptides or digested recombinant proteins. the use of natively expressed purifications from the system of interest, however, has distinct advantages such as: native protein pro- cessing, post-translational modifications, and protein interaction part- ners. use of isobaric purified protein complex trigger channels in tpp stud- ies, and potentially other global proteomics applications, will improve the ability to perform proteomic analysis of low abundance protein complexes and measure systems-level perturbations due to genetic var- iation(s). the potential for this method to be used across different or- ganisms, even those that are difficult to get large amounts of protein from, is further supported by the adaptation of basil for single-cell phosphoproteomics . as many biologically relevant, as well as dis- ease relevant, protein complexes are of relatively low abundance in the figure . peptide detection and quantitation for subunits of the cleavage and polyadenylation factor complex present in the pta -flag isobaric trigger channel. a) model of cpf adapted from casañal et al . the red star denotes the mutant protein used in these studies, ssu - ; the white square denotes the flag-tagged subunit used for the trigger channel affinity purification, pta . b) venn diagram showing the unique peptides detected for cpf subunits across each wt biological replicate. number of psms for cpf subunits in each c) wt and d) ssu - replicate experiment. ion abun- danace for cpf subunits normalized to abundance of pgk (x ) in each e) wt and f) ssu - replicate experiment. figure . effects of ssu - on cpf complex stability and the global proteome a) mtpp normalized cpf subunit melt curves. plots for each of the cpf subunits normalized by the tpp package for a representative rep- licate, trigger p . curves shown in gray are wt and turquoise are ssu - . each line represents one of the cpf subunits. replicates for a are provided in supp. fig. . b) waterfall plots visualizing whole proteome changes in melt temperature (tm), wt- ssu - . a total of , proteins were ordered according to change in tm and plotted. shown are median values for proteins that were quantified in at least two replicates. dotted lines signify a confidence interval of %. there were significant decreases in thermal stability of proteins and significant increases in thermal sta- bility of proteins. change in tm and median values provided in supp. tab. . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cell , improvements in the reproducible detection of such proteins in proteomics experiments would be beneficial to increasing our under- standing of the critical cellular mechanisms in normal and disease states. supplementary material the supplementary material is available as a pdf and associated xls tables. author information corresponding author *e-mail: almosley@iu.edu telephone: ( ) - ; fax: ( ) - orcid sarah a. peck justice: - - - x neil a. mccracken: - - - x josé f. victorino: - - - aruna b. wijeratne: - - - amber l. mosley: - - - present addresses †department of biology, taylor university, upland, indiana, , united states ‡translational genomics research institute, phoenix, arizona, , united states author contributions s.a.p.j.: designed and performed mtpp experiments on biologi- cal replicates, analyzed data, prepared the figures, and wrote the manuscript. n.a.m. performed technical replicate mtpp experi- ments and contributed to the manuscript. j.f.v. affinity purified cpf and confirmed purification via ap-ms (data shown else- where). abw: contributed to the design of experiments. a.l.m.: oversaw various aspects of the project and provided funding for the project, provided direction on data analysis and figure prepa- ration, and wrote the manuscript. the manuscript was written through contributions of all authors. all authors have given ap- proval to the final version of the manuscript. notes the authors declare no competing financial interests. acknowledgments we would like to thank the current members of the mosley lab: whit- ney smith-kinnaman, katlyn hughes burriss, lynn bedard, dominique baldwin, h.r. sagara wijeratne, gitanjali roy, and the iusm proteomics core: emma doud and guihong qi. a portion of the funding for this project was provided by national in- stitute of health t hl (sapj) and by the showalter research trust (alm). nam was supported in part by the indiana university diabetes and obesity research training program, devault fellow- ship. this project was supported, in part, with support from the indiana clinical and translational sciences institute which is funded by award number ul tr from the national institutes of health, na- tional center for advancing translational sciences, clinical and translational sciences award. acquisition of the iusm proteomics core instrumentation used for this project was provided by the indiana university precision health initiative. some of the tmt reagents were graciously provided via the thermo scientific tmt research award (sapj). the content is solely the responsibility of the authors and does not necessarily represent the official views of the national institutes of health. references . sahni, n.; yi, s.; taipale, m.; fuxman bass, j. i.; coulombe- huntington, j.; yang, f.; peng, j.; weile, j.; karras, g. i.; wang, y.; kovacs, i. a.; kamburov, a.; krykbaeva, i.; lam, m. h.; tucker, g.; khurana, v.; sharma, a.; liu, y. y.; yachie, n.; zhong, q.; shen, y.; palagi, a.; san-miguel, a.; fan, c.; balcha, d.; dricot, a.; jordan, d. m.; walsh, j. m.; shah, a. a.; yang, x.; stoyanova, a. k.; leighton, a.; calderwood, m. a.; jacob, y.; cusick, m. e.; salehi-ashtiani, k.; whitesell, l. j.; sunyaev, s.; berger, b.; barabasi, a. l.; charloteaux, b.; hill, d. e.; hao, t.; roth, f. p.; xia, y.; walhout, a. j. m.; lindquist, s.; vidal, m., widespread macromolecular interaction perturbations in human genetic disorders. cell , ( ), - . . huttlin, e. l.; bruckner, r. j.; paulo, j. a.; cannon, j. r.; ting, l.; baltier, k.; colby, g.; gebreab, f.; gygi, m. p.; parzen, h.; szpyt, j.; tam, s.; zarraga, g.; pontano-vaites, l.; swarup, s.; white, a. e.; schweppe, d. k.; rad, r.; erickson, b. k.; obar, r. a.; guruharsha, k. g.; li, k.; artavanis-tsakonas, s.; gygi, s. p.; harper, j. w., architecture of the human interactome defines protein communities and disease networks. nature , ( ), - . . chick, j. m.; munger, s. c.; simecek, p.; huttlin, e. l.; choi, k.; gatti, d. m.; raghupathy, n.; svenson, k. l.; churchill, g. a.; gygi, s. p., defining the consequences of genetic variation on a proteome-wide scale. nature , ( ), - . . gavin, a. c.; bosche, m.; krause, r.; grandi, p.; marzioch, m.; bauer, a.; schultz, j.; rick, j. m.; michon, a. m.; cruciat, c. m.; remor, m.; hofert, c.; schelder, m.; brajenovic, m.; ruffner, h.; merino, a.; klein, k.; hudak, m.; dickson, d.; rudi, t.; gnau, v.; bauch, a.; bastuck, s.; huhse, b.; leutwein, c.; heurtier, m. a.; copley, r. r.; edelmann, a.; querfurth, e.; rybin, v.; drewes, g.; raida, m.; bouwmeester, t.; bork, p.; seraphin, b.; kuster, b.; neubauer, g.; superti-furga, g., functional organization of the yeast proteome by systematic analysis of protein complexes. nature , ( ), - . . lambert, j. p.; ivosev, g.; couzens, a. l.; larsen, b.; taipale, m.; lin, z. y.; zhong, q.; lindquist, s.; vidal, m.; aebersold, r.; pawson, t.; bonner, r.; tate, s.; gingras, a. c., mapping differential interactomes by affinity purification coupled with data-independent mass spectrometry acquisition. nat methods , ( ), - . . go, c. d.; knight, j. d. r.; rajasekharan, a.; rathod, b.; hesketh, g. g.; abe, k. t.; youn, j.-y.; samavarchi-tehrani, p.; zhang, h.; zhu, l. y.; popiel, e.; lambert, j.-p.; coyaud, É.; cheung, s. w. t.; rajendran, d.; wong, c. j.; antonicka, h.; pelletier, l.; raught, b.; palazzo, a. f.; shoubridge, e. a.; gingras, a.-c., a proximity biotinylation map of a human cell. biorxiv . . rolland, t.; tasan, m.; charloteaux, b.; pevzner, s. j.; zhong, q.; sahni, n.; yi, s.; lemmens, i.; fontanillo, c.; mosca, r.; kamburov, a.; ghiassian, s. d.; yang, x.; ghamsari, l.; balcha, d.; begg, b. e.; braun, p.; brehme, m.; broly, m. p.; carvunis, a. r.; convery-zupan, d.; corominas, r.; coulombe-huntington, j.; dann, e.; dreze, m.; dricot, a.; fan, c.; franzosa, e.; gebreab, f.; gutierrez, b. j.; hardy, m. f.; jin, m.; kang, s.; kiros, r.; lin, g. n.; luck, k.; macwilliams, a.; menche, j.; murray, r. r.; palagi, a.; poulin, m. m.; rambout, x.; rasla, j.; reichert, p.; romero, v.; ruyssinck, e.; sahalie, j. m.; scholz, a.; shah, a. a.; sharma, a.; shen, y.; spirohn, k.; tam, s.; tejeda, a. o.; trigg, s. a.; twizere, j. c.; vega, k.; walsh, j.; cusick, m. e.; xia, y.; barabasi, a. l.; iakoucheva, l. m.; aloy, p.; de las rivas, j.; tavernier, j.; calderwood, m. a.; hill, d. e.; hao, t.; roth, f. p.; vidal, m., a proteome-scale map of the human interactome network. cell , ( ), - . . aebersold, r.; mann, m., mass-spectrometric exploration of proteome structure and function. nature , ( ), - . . altelaar, a. f.; munoz, j.; heck, a. j., next-generation proteomics: towards an integrative view of proteome dynamics. nat rev genet , ( ), - . . meier, f.; geyer, p. e.; virreira winter, s.; cox, j.; mann, m., boxcar acquisition method enables single-shot proteomics at a depth of , proteins in minutes. nature methods , ( ), - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . potel, c. m.; lin, m.-h.; heck, a. j. r.; lemeer, s., defeating major contaminants in fe +- immobilized metal ion affinity chromatography (imac) phosphopeptide enrichment. molecular & cellular proteomics , ( ), - . . humphrey, s. j.; azimifar, s. b.; mann, m., high-throughput phosphoproteomics reveals in vivo insulin signaling dynamics. nature biotechnology , ( ), - . . specht, h.; slavov, n., optimizing accuracy and depth of protein quantification in experiments using isobaric carriers. j proteome res . . slavov, n., single-cell protein analysis by mass spectrometry. curr opin chem biol , , - . . zhu, y.; scheibinger, m.; ellwanger, d. c.; krey, j. f.; choi, d.; kelly, r. t.; heller, s.; barr-gillespie, p. g., single-cell proteomics reveals changes in expression during hair-cell development. elife , . . budnik, b.; levy, e.; harmange, g.; slavov, n., scope-ms: mass spectrometry of single mammalian cells quantifies proteome heterogeneity during cell differentiation. genome biol , ( ), . . yi, l.; tsai, c. f.; dirice, e.; swensen, a. c.; chen, j.; shi, t.; gritsenko, m. a.; chu, r. k.; piehowski, p. d.; smith, r. d.; rodland, k. d.; atkinson, m. a.; mathews, c. e.; kulkarni, r. n.; liu, t.; qian, w. j., boosting to amplify signal with isobaric labeling (basil) strategy for comprehensive quantitative phosphoproteomic characterization of small populations of cells. anal chem , ( ), - . . mcalister, g. c.; huttlin, e. l.; haas, w.; ting, l.; jedrychowski, m. p.; rogers, j. c.; kuhn, k.; pike, i.; grothe, r. a.; blethrow, j. d.; gygi, s. p., increasing the multiplexing capacity of tmts using reporter ion isotopologues with isobaric masses. anal chem , ( ), - . . thompson, a.; schafer, j.; kuhn, k.; kienle, s.; schwarz, j.; schmidt, g.; neumann, t.; johnstone, r.; mohammed, a. k.; hamon, c., tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by ms/ms. anal chem , ( ), - . . thompson, a.; wolmer, n.; koncarevic, s.; selzer, s.; bohm, g.; legner, h.; schmid, p.; kienle, s.; penning, p.; hohle, c.; berfelde, a.; martinez-pinna, r.; farztdinov, v.; jung, s.; kuhn, k.; pike, i., tmtpro: design, synthesis, and initial evaluation of a proline-based isobaric -plex tandem mass tag reagent set. anal chem , ( ), - . . tsai, c. f.; zhao, r.; williams, s. m.; moore, r. j.; schultz, k.; chrisler, w. b.; pasa-tolic, l.; rodland, k. d.; smith, r. d.; shi, t.; zhu, y.; liu, t., an improved boosting to amplify signal with isobaric labeling (ibasil) strategy for precise quantitative single-cell proteomics. mol cell proteomics , ( ), - . . chua, x. y.; mensah, t.; aballo, t. j.; mackintosh, s. g.; edmondson, r. d.; salomon, a. r., tandem mass tag approach utilizing pervanadate boost channels delivers deeper quantitative characterization of the tyrosine phosphoproteome. mol cell proteomics , mcp.tir . . . klann, k.; tascher, g.; munch, c., functional translatome proteomics reveal converging and dose-dependent regulation by mtorc and eif alpha. mol cell , ( ), - e . . yamamoto, w. r.; bone, r. n.; sohn, p.; syed, f.; reissaus, c. a.; mosley, a. l.; wijeratne, a. b.; true, j. d.; tong, x.; kono, t.; evans-molina, c., endoplasmic reticulum stress alters ryanodine receptor function in the murine pancreatic beta cell. j biol chem , ( ), - . . savitski, m. m.; reinhard, f. b.; franken, h.; werner, t.; savitski, m. f.; eberhard, d.; martinez molina, d.; jafari, r.; dovega, r. b.; klaeger, s.; kuster, b.; nordlund, p.; bantscheff, m.; drewes, g., tracking cancer drugs in living cells by thermal profiling of the proteome. science , ( ), . . franken, h.; mathieson, t.; childs, d.; sweetman, g. m.; werner, t.; togel, i.; doce, c.; gade, s.; bantscheff, m.; drewes, g.; reinhard, f. b.; huber, w.; savitski, m. m., thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. nat protoc , ( ), - . . mateus, a.; kurzawa, n.; becher, i.; sridharan, s.; helm, d.; stein, f.; typas, a.; savitski, m. m., thermal proteome profiling for interrogating protein interactions. mol syst biol , ( ), e . . peck justice, s. a.; barron, m. p.; qi, g. d.; wijeratne, h. r. s.; victorino, j. f.; simpson, e. r.; vilseck, j. z.; wijeratne, a. b.; mosley, a. l., mutant thermal proteome profiling for characterization of missense protein variants and their associated phenotypes within the proteome. j biol chem . . batth, t. s.; francavilla, c.; olsen, j. v., off-line high-ph reversed-phase fractionation for in-depth phosphoproteomics. j proteome res , ( ), - . . wang, y.; yang, f.; gritsenko, m. a.; wang, y.; clauss, t.; liu, t.; shen, y.; monroe, m. e.; lopez-ferrer, d.; reno, t.; moore, r. j.; klemke, r. l.; camp, d. g., nd; smith, r. d., reversed-phase chromatography with multiple fraction concatenation strategy for proteome profiling of human mcf a cells. proteomics , ( ), - . . mertins, p.; tang, l. c.; krug, k.; clark, d. j.; gritsenko, m. a.; chen, l.; clauser, k. r.; clauss, t. r.; shah, p.; gillette, m. a.; petyuk, v. a.; thomas, s. n.; mani, d. r.; mundt, f.; moore, r. j.; hu, y.; zhao, r.; schnaubelt, m.; keshishian, h.; monroe, m. e.; zhang, z.; udeshi, n. d.; mani, d.; davies, s. r.; townsend, r. r.; chan, d. w.; smith, r. d.; zhang, h.; liu, t.; carr, s. a., reproducible workflow for multiplexed deep-scale proteome and phosphoproteome analysis of tumor tissues by liquid chromatography-mass spectrometry. nat protoc , ( ), - . . hogrebe, a.; von stechow, l.; bekker-jensen, d. b.; weinert, b. t.; kelstrup, c. d.; olsen, j. v., benchmarking common quantification strategies for large-scale phosphoproteomics. nat commun , ( ), . . gilar, m.; olivova, p.; daly, a. e.; gebler, j. c., orthogonality of separation in two-dimensional liquid chromatography. anal chem , ( ), - . . ludwig, k. r.; schroll, m. m.; hummon, a. b., comparison of in-solution, fasp, and s-trap based digestion methods for bottom-up proteomic studies. j proteome res , ( ), - . . victorino, j. f.; fox, m. j.; smith-kinnaman, w. r.; peck justice, s. a.; burriss, k. h.; boyd, a. k.; zimmerly, m. a.; chan, r. r.; hunter, g. o.; liu, y.; mosley, a. l., rna polymerase ii ctd phosphatase rtr fine-tunes transcription termination. plos genet , ( ), e . . bedard, l. g.; dronamraju, r.; kerschner, j. l.; hunter, g. o.; axley, e. d.; boyd, a. k.; strahl, b. d.; mosley, a. l., quantitative analysis of dynamic protein interactions during transcription reveals a role for casein kinase ii in polymerase-associated factor (paf) complex phosphorylation and regulation of histone h b monoubiquitylation. j biol chem , ( ), - . . smith-kinnaman, w. r.; berna, m. j.; hunter, g. o.; true, j. d.; hsu, p.; cabello, g. i.; fox, m. j.; varani, g.; mosley, a. l., the interactome of the atypical phosphatase rtr in saccharomyces cerevisiae. mol biosyst , ( ), - . . mosley, a. l.; hunter, g. o.; sardiu, m. e.; smolle, m.; workman, j. l.; florens, l.; washburn, m. p., quantitative proteomics demonstrates that the rna polymerase ii subunits rpb and rpb dissociate during transcriptional elongation. mol cell proteomics , ( ), - . . mosley, a. l.; sardiu, m. e.; pattenden, s. g.; workman, j. l.; florens, l.; washburn, m. p., highly reproducible label free quantitative proteomic analysis of rna polymerase complexes. mol cell proteomics , ( ), m . . mcginty, r. j.; puleo, f.; aksenova, a. y.; hisey, j. a.; shishkin, a. a.; pearson, e. l.; wang, e. t.; housman, d. e.; moore, c.; mirkin, s. m., a defective mrna cleavage and polyadenylation complex facilitates expansions of transcribed (gaa)n repeats associated with friedreich's ataxia. cell rep , ( ), - . . pappas, d. l.; hampsey, m., functional interaction between ssu and the rpb subunit of rna polymerase ii in saccharomyces cerevisiae. , ( ), - . . funakoshi, m.; hochstrasser, m., small epitope-linker modules for pcr-based c-terminal tagging insaccharomyces cerevisiae. yeast , ( ), - . . perez-riverol, y.; csordas, a.; bai, j.; bernal-llinares, m.; hewapathirana, s.; kundu, d. j.; inuganti, a.; griss, j.; mayer, g.; .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / eisenacher, m.; pérez, e.; uszkoreit, j.; pfeuffer, j.; sachsenberg, t.; yılmaz, Ş.; tiwary, s.; cox, j.; audain, e.; walzer, m.; jarnuczak, a. f.; ternent, t.; brazma, a.; vizcaíno, j. a., the pride database and related tools and resources in : improving support for quantification data. nucleic acids research , (d ), d -d . . oliveros, j. c., venny. an interactive tool for comparing lists with venn's diagrams. - . . wickham, h. ggplot : elegant graphics for data analysis, springer-verlag new york: . . childs d, k. n., franken h, doce c, savitski m, huber w tpp: analyze thermal proteome profiling (tpp) experiments, . . ; . . walmsley, s. j.; rudnick, p. a.; liang, y.; dong, q.; stein, s. e.; nesvizhskii, a. i., comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. j proteome res , ( ), - . . burkhart, j. m.; schumbrutzki, c.; wortelkamp, s.; sickmann, a.; zahedi, r. p., systematic and quantitative comparison of digest efficiency and specificity reveals the impact of trypsin quality on ms-based proteomics. j proteomics , ( ), - . . chen, j.; moore, c., separation of factors required for cleavage and polyadenylation of yeast pre-mrna. , ( ), - . . kessler, m. m.; zhao, j.; moore, c. l., purification of the saccharomyces cerevisiae cleavage/polyadenylation factor i. separation into two components that are required for both cleavage and polyadenylation of mrna ' ends. j biol chem , ( ), - . . proudfoot, n. j., transcriptional termination in mammals: stopping the rna polymerase ii juggernaut. science , ( ), aad . . eaton, j. d.; davidson, l.; bauer, d. l. v.; natsume, t.; kanemaki, m. t.; west, s., xrn accelerates termination by rna polymerase ii, which is underpinned by cpsf activity. genes dev , ( ), - . . casanal, a.; kumar, a.; hill, c. h.; easter, a. d.; emsley, p.; degliesposti, g.; gordiyenko, y.; santhanam, b.; wolf, j.; wiederhold, k.; dornan, g. l.; skehel, m.; robinson, c. v.; passmore, l. a., architecture of eukaryotic mrna '-end processing machinery. science , ( ), - . . feng, z. h.; wilson, s. e.; peng, z. y.; schlender, k. k.; reimann, e. m.; trumbly, r. j., the yeast glc -gene required for glycogen accumulation encodes a type- protein phosphatase. journal of biological chemistry , ( ), - . . martín, r.; stonyte, v.; lopez-aviles, s., protein phosphatases in g regulation. international journal of molecular sciences , ( ), . . moura, m.; conde, c., phosphatases in mitosis: roles and regulation. biomolecules , ( ), . . tu, j.; carlson, m., the glc type protein phosphatase is required for glucose repression in saccharomyces cerevisiae. mol cell biol , ( ), - . . ramaswamy, n. t.; li, l.; khalil, m.; cannon, j. f., regulation of yeast glycogen metabolism and sporulation by glc p protein phosphatase. genetics , ( ), - . . dichtl, b.; blank, d.; ohnacker, m.; friedlein, a.; roeder, d.; langen, h.; keller, w., a role for ssu in balancing rna polymerase ii transcription elongation and termination. molecular cell , ( ), - . . nedea, e.; he, x.; kim, m.; pootoolal, j.; zhong, g.; canadien, v.; hughes, t.; buratowski, s.; moore, c. l.; greenblatt, j., organization and function of apt, a subcomplex of the yeast cleavage and polyadenylation factor involved in the formation of mrna and small nucleolar rna '-ends. , ( ), - . . he, x.; khan, a. u.; cheng, h.; pappas, d. l., jr.; hampsey, m.; moore, c. l., functional interactions between the transcription and mrna ' end processing machineries mediated by ssu and sub . genes dev , ( ), - . . steinmetz, e. j.; brow, d. a., ssu protein mediates both poly(a)-coupled and poly(a)-independent termination of rna polymerase ii transcription. , ( ), - . . zhang, d. w.; mosley, a. l.; ramisetty, s. r.; rodriguez- molina, j. b.; washburn, m. p.; ansari, a. z., ssu phosphatase- dependent erasure of phospho-ser marks on the rna polymerase ii c- terminal domain is essential for viability and transcription termination. j biol chem , ( ), - . . ansari, a.; hampsey, m., a role for the cpf '-end processing machinery in rnap ii-dependent gene looping. genes dev , ( ), - . . allepuz-fuster, p.; o'brien, m. j.; gonzalez-polo, n.; pereira, b.; dhoondia, z.; ansari, a.; calvo, o., rna polymerase ii plays an active role in the formation of gene loops through the rpb subunit. nucleic acids res , ( ), - . . singh, b. n.; hampsey, m., a transcription-independent role for tfiib in gene looping. mol cell , ( ), - . . tan-wong, s. m.; zaugg, j. b.; camblong, j.; xu, z.; zhang, d. w.; mischo, h. e.; ansari, a. z.; luscombe, n. m.; steinmetz, l. m.; proudfoot, n. j., gene loops enhance transcriptional directionality. science , ( ), - . . reyes-reyes, m.; hampsey, m., role for the ssu c-terminal domain phosphatase in rna polymerase ii transcription elongation. mol cell biol , ( ), - . . mi, h.; huang, x.; muruganujan, a.; tang, h.; mills, c.; kang, d.; thomas, p. d., panther version : expanded annotation data from gene ontology and reactome pathways, and data analysis tool enhancements. nucleic acids res , (d ), d -d . . ganem, c.; devaux, f.; torchet, c.; jacq, c.; quevillon- cheruel, s.; labesse, g.; facca, c.; faye, g., ssu is a phosphatase essential for transcription termination of snornas and specific mrnas in yeast. embo j , ( ), - . . loya, t. j.; o'rourke, t. w.; reines, d., a genetic screen for terminator function in yeast identifies a role for a new functional domain in termination factor nab . nucleic acids res , ( ), - . . mahat, d. b.; salamanca, h. h.; duarte, f. m.; danko, c. g.; lis, j. t., mammalian heat shock response and mechanisms underlying its genome-wide transcriptional regulation. mol cell , ( ), - . . duarte, f. m.; fuda, n. j.; mahat, d. b.; core, l. j.; guertin, m. j.; lis, j. t., transcription factors gaf and hsf act at distinct regulatory steps to modulate stress-induced gene activation. genes dev , ( ), - . . ho, b.; baryshnikova, a.; brown, g. w., unification of protein abundance datasets yields a quantitative saccharomyces cerevisiae proteome. cell syst , ( ), - e . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted december , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / structural insights into cullin -ring ubiquitin ligase remodelling by vpr from simian immunodeficiency viruses structural insights into cullin -ring ubiquitin ligase remodelling by vpr from simian immunodeficiency viruses sofia banchenko ¶, ferdinand krupp ¶, christine gotthold , jörg bürger , , andrea graziadei , francis o’reilly , ludwig sinn , olga ruda , juri rappsilber , , christian m. t. spahn , thorsten mielke , ian a. taylor , david schwefel * institute of medical physics and biophysics, charité – universitätsmedizin berlin, corporate member of freie universität berlin, humboldt-universität zu berlin, and berlin institute of health, berlin, germany microscopy and cryo-electron microscopy service group, max-planck-institute for molecular genetics, berlin, germany bioanalytics unit, institute of biotechnology, technische universität berlin, berlin, germany wellcome centre for cell biology, university of edinburgh, edinburgh, united kingdom macromolecular structure laboratory, the francis crick institute, london, united kingdom *corresponding author e-mail: david.schwefel@charite.de (ds) ¶these authors contributed equally to this work (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . abstract viruses have evolved means to manipulate the host’s ubiquitin-proteasome system, in order to down- regulate antiviral host factors. the vpx/vpr family of lentiviral accessory proteins usurp the substrate receptor dcaf of host cullin -ring ligases (crl ), a family of modular ubiquitin ligases involved in dna replication, dna repair and cell cycle regulation. crl dcaf specificity modulation by vpx and vpr from certain simian immunodeficiency viruses (siv) leads to recruitment, poly-ubiquitylation and subsequent proteasomal degradation of the host restriction factor samhd , resulting in enhanced virus replication in differentiated cells. to unravel the mechanism of siv vpr-induced samhd ubiquitylation, we conducted integrative biochemical and structural analyses of the vpr protein from sivs infecting cercopithecus cephus (sivmus). x-ray crystallography reveals commonalities between sivmus vpr and other members of the vpx/vpr family with regard to dcaf interaction, while cryo- electron microscopy and cross-linking mass spectrometry highlight a divergent molecular mechanism of samhd recruitment. in addition, these studies demonstrate how sivmus vpr exploits the dynamic architecture of the multi-subunit crl dcaf assembly to optimise samhd ubiquitylation. together, the present work provides detailed molecular insight into variability and species-specificity of the evolutionary arms race between host samhd restriction and lentiviral counteraction through vpx/vpr proteins. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . author summary due to the limited size of virus genomes, virus replication critically relies on host cell components. in addition to the host cell’s energy metabolism and its dna replication and protein synthesis apparatus, the protein degradation machinery is an attractive target for viral re-appropriation. certain viral factors divert the specificity of host ubiquitin ligases to antiviral host factors, in order to mark them for destruction by the proteasome, to lift intracellular barriers to virus replication. here, we present molecular details of how the simian immunodeficiency virus accessory protein vpr interacts with a substrate receptor of host cullin -ring ubiquitin ligases, and how this interaction redirects the specificity of cullin -ring to the antiviral host factor samhd . the studies uncover the mechanism of vpr-induced samhd recruitment and subsequent ubiquitylation. moreover, by comparison to related accessory proteins from other immunodeficiency virus species, we illustrate the surprising variability in the molecular strategies of samhd counteraction, which these viruses adopted during evolutionary adaptation to their hosts. lastly, our work also provides deeper insight into the inner workings of the host’s cullin -ring ubiquitylation machinery. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . introduction a large proportion of viruses have evolved means to co-opt their host’s ubiquitylation machinery, in order to improve replication conditions, either by introducing viral ubiquitin ligases and deubiquitinases, or by modification of host proteins involved in ubiquitylation [ - ]. in particular, host ubiquitin ligases are a prominent target for viral usurpation, to redirect specificity towards antiviral host restriction factors. this results in recruitment of restriction factors as non-endogenous neo-substrates, inducing their poly-ubiquitylation and subsequent proteasomal degradation [ - ]. this counteraction of the host’s antiviral repertoire is essential for virus infectivity and spread [ - ], and mechanistic insights into these specificity changes extend our understanding of viral pathogenesis and might pave the way for novel treatments. frequently, virally encoded modifying proteins associate with, and adapt the cullin -ring ubiquitin ligases (crl ) [ ]. crl consists of a cullin (cul ) scaffold that bridges the catalytic ring-domain subunit roc to the adaptor protein ddb , which in turn binds to exchangeable substrate receptors (dcafs, ddb - and cul -associated factors) [ - ]. in some instances, the ddb adaptor serves as an anchor for virus proteins, which then act as “viral dcafs” to recruit the antiviral substrate. examples are the simian virus v protein and mouse cytomegalovirus m , which bind to ddb and recruit stat / proteins for ubiquitylation, in order to interfere with the host’s interferon response [ - ]. similarly, cul -dependent downregulation of stat signalling is important for west nile virus replication [ ]. in addition, the hepatitis b virus x protein hijacks ddb to induce proteasomal destruction of the structural maintenance of chromosome (smc) complex to promote virus replication [ , ]. viral factors also bind to and modify dcaf receptors in order to redirect them to antiviral substrates. prime examples are the lentiviral accessory proteins vpr and vpx. all contemporary human and simian immunodeficiency viruses (hiv/siv) encode vpr, while only two lineages, represented by hiv- and siv infecting mandrills, carry vpx [ ]. vpr and vpx proteins are packaged into progeny virions and released into the host cell upon infection, where they bind to dcaf in the nucleus [ ]. in this work, corresponding simian immunodeficiency virus vpx/vpr proteins will be indicated with their host species as subscript, with the following abbreviations used: mus – moustached monkey (cercopithecus (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . cephus), mnd – mandrill (mandrillus sphinx), rcm – red-capped mangabey (cercocebus torquatus), sm – sooty mangabey (cercocebus atys), deb – de brazza’s monkey (cercopithecus neglectus), syk – syke’s monkey (cercopithecus albogularis), agm – african green monkey (chlorocebus spec). vprhiv- is important for virus replication in vivo and in macrophage infection models [ ]. recent proteomic analyses revealed that dcaf specificity modulation by vprhiv- proteins results in down- regulation of hundreds of host proteins in a dcaf - and proteasome-dependent manner [ ], including the previously reported vprhiv- degradation targets ung [ ], hltf [ ], mus [ , ], mcm [ ] and tet [ ]. this surprising promiscuity in degradation targets is also partially conserved in more distant clades exemplified by vpragm and vprmus [ ]. however, vpr pleiotropy, and the lack of easily accessible experimental models, have prevented a characterisation of how these degradation events precisely promote replication [ ]. by contrast, vpx, exhibits a much narrower substrate range. it has recently been reported to target stimulator of interferon genes (sting) and components of the human silencing hub (hush) complex for degradation, leading to inhibition of antiviral cgas-sting-mediated signalling and reactivation of latent proviruses, respectively [ - ]. importantly, vpx also recruits the samhd restriction factor to dcaf , in order to mark it for proteasomal destruction [ , ]. samhd is a deoxynucleotide triphosphate (dntp) triphosphohydrolase that restricts retroviral replication in non-dividing cells by lowering the dntp pool to levels that cannot sustain viral reverse transcription [ - ]. retroviruses that express vpx are able to alleviate samhd restriction and allow replication in differentiated myeloid lineage cells, resting t cells and memory t cells [ , , ]. as a result of the constant evolutionary arms race between the host’s samhd restriction and its viral antagonist vpx, the mechanism of vpx-mediated samhd recruitment is highly virus species- and strain-specific: the vpx clade represented by vpxhiv- recognises the samhd c-terminal domain (ctd), while vpxmnd /rcm binds the samhd n-terminal domain (ntd) in a fundamentally different way [ , - ]. in the course of evolutionary adaptation to their primate hosts, and due to selective pressure to evade samhd restriction, two groups of sivs that do not have vpx, sivagm, and sivdeb/mus/syk, neo- functionalised their vpr to bind samhd and induce its degradation [ , , ]. consequently, these (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . species evolved “hybrid” vpr proteins that retain targeting of some host factors depleted by hiv- -type vpr [ ], and additionally induce samhd degradation. to uncover the molecular mechanisms of dcaf - and samhd -interaction of such a “hybrid” vpr, we initiated integrative biochemical and structural analyses of the vpr protein from an siv infecting cercopithecus cephus, vprmus. these studies reveal similarities and differences to vpx and vpr proteins from other lentivirus species and pinpoint the divergent molecular mechanism of vprmus-dependent samhd recruitment to crl dcaf . furthermore, cryo-electron microscopic (cryo-em) reconstructions of a vprmus-modified crl dcaf protein complex allow for insights into the structural plasticity of the entire crl ubiquitin ligase assembly, with implications for the ubiquitin transfer mechanism. results samhd -ctd is necessary and sufficient for vprmus-binding and ubiquitylation in vitro to investigate the molecular interactions between vprmus, the neo-substrate samhd from rhesus macaque and crl subunits ddb /dcaf c-terminal domain (dcaf -ctd), protein complexes were reconstituted in vitro from purified components and analysed by gel filtration (gf) chromatography. the different protein constructs that were employed are shown schematically in s a fig. vprmus is insoluble after removal of the gst affinity purification tag (s b fig) and accordingly could not be applied to the gf column. no interaction of samhd with ddb /dcaf -ctd could be detected in the absence of vprmus (s c fig). analysis of binary protein combinations (vprmus and ddb /dcaf - ctd; vprmus and samhd ) shows that vprmus elutes in a single peak together with ddb /dcaf -ctd (s d fig) or with samhd (s e fig). incubation of vprmus with ddb /dcaf b and samhd followed by gf resulted in elution of all three components in a single peak (fig a, b, red trace). together, these results show that vprmus forms stable binary and ternary protein complexes with ddb /dcaf -ctd and/or samhd in vitro. furthermore, incubation with any of these interaction partners apparently stabilises vprmus by alleviating its tendency for aggregation/insolubility. previous cell-based assays indicated that residues - of rhesus macaque samhd (samhd - ctd) are necessary for vprmus-induced proteasomal degradation [ ]. to test this finding in our in vitro (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . system, constructs containing samhd -ctd fused to t lysozyme (t l-samhd -ctd), or lacking samhd -ctd (samhd -Δctd, fig a), were incubated with vprmus and ddb /dcaf -ctd, and complex formation was assessed by gf chromatography. analysis of the resulting chromatograms by sds-page shows that samhd -Δctd did not co-elute with ddb /dcaf -ctd/vprmus (fig a, b, green trace). by contrast, t l-samhd -ctd accumulated in a single peak, which also contained ddb /dcaf -ctd and vprmus (fig a, b, cyan trace). these results confirm that samhd -ctd is necessary for stable association with ddb /dcaf -ctd/vprmus in vitro, and demonstrate that samhd -ctd is sufficient for vprmus-mediated recruitment of the t l-samhd -ctd fusion construct to ddb /dcaf -ctd. to correlate these data with enzymatic activity, in vitro ubiquitylation assays were conducted by incubating samhd , samhd -Δctd or t l-samhd -ctd with purified crl dcaf -ctd, e (uba ), e (ubch c), ubiquitin and atp. input proteins are shown in s a fig, and control reactions in s b, c fig. in the absence of vprmus, no samhd ubiquitylation was observed (figs c and s d), while addition of vprmus resulted in robust samhd ubiquitylation (figs d and s e). in agreement with the analytical gf data, samhd -Δctd was not ubiquitylated in the presence of vprmus (figs e and s f), while t l-samhd -ctd, was ubiquitylated with similar kinetics as the full-length protein (figs f and s f). again, these data substantiate the functional importance of samhd -ctd for vprmus-mediated recruitment to the crl dcaf ubiquitin ligase. crystal structure analysis of apo- and vprmus-bound ddb /dcaf -ctd protein complexes to obtain structural information regarding vprmus and its mode of binding to the crl substrate receptor dcaf , the x-ray crystal structures of a ddb /dcaf -ctd complex, and ddb /dcaf -ctd/t l- vprmus (residues - ) fusion protein ternary complex were determined. the structures were solved using molecular replacement and refined to resolutions of . Å and . Å respectively (s table). vprmus adopts a three-helix bundle fold, stabilised by coordination of a zinc ion by his and cys residues on helix- and at the c-terminus (fig a). superposition of vprmus with previously determined vpxsm [ ], vpxmnd [ , ], and vprhiv- [ ] structures reveals a conserved three-helix bundle fold, and (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . similar position of the helix bundles on dcaf -ctd (s a fig). in addition, the majority of side chains involved in dcaf -interaction are type-conserved in all vpx and vpr proteins (figs s b-g and s a), strongly suggesting a common molecular mechanism of host crl -dcaf hijacking by the vpx/vpr family of accessory proteins. however, there are also significant differences in helix length and register as well as conformational variation in the loop region n-terminal of helix- , at the start of helix- and in the loop between helices- and - (s a fig). vprmus binds to the side and on top of the disk-shaped -bladed β-propeller (bp) dcaf -ctd domain with a total contact surface area of ~ Å comprising three major regions of interaction. the extended vprmus n-terminus attaches to the cleft between dcaf bp blades and through several hydrogen bonds, electrostatic and hydrophobic interactions (s b-d fig). a second, smaller contact area is formed by hydrophobic interaction between vprmus residues l and e from helix- , and dcaf w , located in a loop on top of bp blade (s e fig). the third interaction surface comprises the c-terminal half of vprmus helix- , which inserts into a ridge on top of dcaf (s f, g fig). superposition of the apo-ddb /dcaf -ctd and vprmus-bound crystal structures reveals conformational changes in dcaf upon vprmus association. binding of the n-terminal arm of vprmus induces only a minor rearrangement of a loop in bp blade (s c fig). by contrast, significant structural changes occur on the upper surface of the bp domain: polar and hydrophobic interactions of dcaf residues p , f , f , n , l , m and t with vprmus side chains of t , r , r and e in helix- result in the stabilisation of the sequence stretch that connect bp blades and (“c-terminal loop”, figs b and s f). moreover, side chain electrostatic interactions of vprmus residues r , r and r with dcaf e , e and e lock the conformation of an “acidic loop” upstream of bp blade , which is also unstructured and flexible in the absence of vprmus (figs b, c and s d, f). notably, in previously determined structures of vpx/dcaf /samhd complexes the “acidic loop” is a central point of ternary contact, providing a binding platform for positively charged amino acid side chains in either the samhd n- or c-terminus [ - ]. for example, vpxsm positions samhd -ctd in such a way, that samhd k engages in electrostatic interaction with the dcaf “acidic loop” residue d (fig c, left panel). however, in the vprmus crystal structure the bound vprmus now blocks (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . access to the corresponding samhd -ctd binding pocket, in particular by the positioning of an extended n-terminal loop that precedes helix- . additionally, vprmus side chains r , r and r neutralise the dcaf “acidic loop”, precluding the formation of further salt bridges to basic residues in samhd -ctd (fig c, right panel). to validate the importance of vprmus residues r and r for dcaf -ctd- and samhd -binding, charge reversal mutations to glutamates were generated by site-directed mutagenesis. the effect of the vprmus r e r e double mutant on complex assembly was then analysed by gf chromatography. sds-page analysis of the resulting chromatographic profile shows an almost complete loss of the ddb /dcaf -ctd/vprmus/samhd complex peak (fig d, fraction ), when compared to the wild type, concomitant with enrichment of (i) vprmus r e r e-bound ddb /dcaf -ctd (fig d, fractions - ), and of (ii) vprmus r e r e/samhd binary complex (fig d, fraction - ). this suggests that charge reversal of vprmus side chains r and r weakens the strong association with dcaf observed in wild type vprmus, due to loss of electrostatic interaction with the “acidic loop”, in accordance with the crystal structure. consequently, some proportion of vpr-bound samhd dissociates, further indicating that vprmus side chains r and r are not central to samhd interaction. molecular mechanism of samhd -targeting to obtain mechanistic insight into vprmus-recruitment of samhd -ctd, we initiated cryo-em analyses of the crl dcaf -ctd/vprmus/samhd assembly. in these studies, the small ubiquitin-like protein nedd was enzymatically attached to the cul subunit, in order to obtain its active form (s a fig) [ ]. a crl -nedd dcaf -ctd/vprmus/samhd complex was reconstituted in vitro and purified by gf chromatography (s b fig). extensive d and d classification of the resulting particle images revealed considerable conformational heterogeneity, especially regarding the position of the cul - nedd /roc subcomplex (stalk) relative to ddb /dcaf /vprmus (core), (s fig). nevertheless, a homogeneous particle population could be separated, which yielded a d reconstruction at a nominal resolution of . Å that contained electron density corresponding to the core (s c-f fig). molecular models of ddb bp domains a and c (bpa, bpc), dcaf -ctd and vprmus, derived from (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . our crystal structure (fig ), could be fitted as rigid bodies into this cryo-em volume (fig a). no obvious electron density was visible for the bulk of samhd . however, close inspection revealed an additional tubular, slightly arcing density feature, approx. Å in length, located on the upper surface of the vprmus helix bundle, approximately Å away from and opposite of the vprmus/dcaf -ctd binding interface (fig a, red arrows). one end of the tubular volume contacts the middle of vprmus helix- , and the other end forms additional contacts to the c-terminus of helix- and the n-terminus of helix- . a local resolution of . - Å precluded the fitting of an atomic model. considering the biochemical data, showing that samhd -ctd is sufficient for recruitment to ddb /dcaf /vprmus, we hypothesise that this observed electron density feature corresponds to the region of samhd -ctd which physically interacts with vprmus. given its dimensions, the putative samhd -ctd density could accommodate approx. amino acid residues in a fully extended conformation or up to residues in a kinked helical arrangement. all previous crystal structure analyses [ ], as well as secondary structure predictions indicate that samhd residues c-terminal to the catalytic hd domain and c-terminal lobe (amino acids - ) are disordered in the absence of additional binding partners. accordingly, the globular domains of the samhd molecule might be flexibly linked to the c-terminal tether identified here. in that case, the bulk of samhd samples a multitude of positions relative to the ddb /dcaf - ctd/vprmus core, and consequently is averaged out in the process of cryo-em reconstruction. the topology of crl dcaf -ctd/vprmus/samhd and the binding region of samhd -ctd were further assessed by cross-linking mass spectrometry (clms) using the photo-reactive cross-linker sulfo-sda [ ]. a large number of cross-links between samhd and the c-terminal half of cul , the side and top of dcaf -ctd, and bp blades - of ddb were found, consistent with highly variable positioning of the sam and hd domains of samhd relative to the crl core (fig b). moreover, multiple cross-links between samhd -ctd and vprmus were observed, more specifically locating to a sequence stretch comprising the c-terminal half of vprmus helix- (residues a -e ), and to a portion of the disordered vprmus c-terminus (residues y , y ). these data are in accordance with the presence of samhd -ctd in the unassigned cryo-em density and its role as vprmus tether. the remaining samhd -ctd cross-links were with the c-terminus of cul and the “acidic loop” of dcaf (fig b). distance restraints from these samhd -ctd cross-links, together with our structural models of (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . crl dcaf -ctd/vprmus (see below), were employed to visualise the interaction space accessible to the centre of mass of samhd -ctd. this analysis is compatible with recruitment of samhd -ctd on top of the vprmus helix bundle as indicated by cryo-em (fig c). interestingly, cross-links to vprmus were restricted to the c-terminal end of samhd -ctd (residues k , k ), while cross-links to cul and dcaf were found in the n-terminal portion (residues k , k , t -s ). these observations are consistent with a model where the very c-terminus of samhd is immobilised on vprmus, and samhd -ctd residues further upstream are exposed to the catalytic machinery surrounding the cul c-terminal domain. to further probe the interaction, vprmus amino acid residues in close proximity to the putative samhd - ctd density were substituted by site-directed mutagenesis. specifically, vprmus w was changed to alanine to block a hydrophobic contact with samhd -ctd involving the aromatic side chain, and vprmus a was changed to a bulky tryptophan, in order to introduce a steric clash with samhd -ctd (fig d). this vprmus w a a w double mutant was then assessed for complex formation with ddb /dcaf -ctd and samhd by analytical gf. in comparison to wild type vprmus, the w a a w mutant showed a reduction of ddb /dcaf -ctd/vprmus/samhd complex peak intensity (fig e, fraction ), concomitant with (i) enrichment of ddb /dcaf -ctd/vprmus ternary complex, sub- stoichiometrically bound to samhd (fig e, fraction ), (ii) excess ddb /dcaf -ctd binary complex (fig e, fraction ), and (iii) monomeric samhd species (fig e, fractions - ). in conclusion, this biochemical analysis, together with cryo-em reconstruction at intermediate resolution and clms analysis, locate the samhd -ctd binding site on the upper surface of the vprmus helix bundle. these data allow for structural comparison with neo-substrate binding modes of vpx and vpr proteins from different retrovirus lineages (fig a-d). vpxhiv- and vpxsm position samhd -ctd at the side of the dcaf bp domain through interactions with the n-termini of vpx helices- and - (fig b) [ ]. vpxmnd and vpxrcm bind samhd -ntd using a bipartite interface comprising the side of the dcaf bp and the upper surface of the vpx helix bundle (fig c) [ , ]. vprhiv- engages its ubiquitylation substrate ung using both the top and the upper edge of the vprhiv- helix bundle (fig d) [ ]. of note, these upper-surface interaction interfaces only partially overlap with the vprmus/samhd -ctd (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . binding interface identified here and employ fundamentally different sets of interacting amino acid residues. thus, it appears that the molecular interaction interfaces driving vpx/vpr-mediated neo- substrate recognition and degradation are not conserved between related siv and hiv vpx/vpr accessory proteins, even in cases where identical samhd -ctd regions are targeted for recruitment. cryo-em analysis of vprmus-modified crl -nedd dcaf -ctd conformational states and dynamics a reanalysis of the cryo-em data using strict selection of high-quality d classes, followed by focussed d classification yielded three additional particle populations, resulting in d reconstructions at - Å resolution, which contained both the vprmus-bound crl core and the stalk (conformational states- , - and - , figs a and s g-j). the quality of the d volumes was sufficient to fit crystallographic models of core (fig ) and the stalk (pdb hye) [ ] as rigid bodies (figs b and s a). for the catalytic ring- domain subunit roc , only fragmented electron density was present near the position it occupies in the crystallographic model (s a fig). in all three states, electron density was selectively absent for the c- terminal cul winged helix b (whb) domain (residues - ), which contains the nedd modification site (k ), and for the preceding α-helix, which connects the cul n-terminal domain to the whb domain (s a fig). in accordance with this observation, the positions of crl -attached nedd and of the crl roc ring domain are sterically incompatible upon superposition of their respective crystal structures (s b fig) [ ]. alignment of d volumes from states- , - and - shows that core densities representing ddb bpa, bpc, dcaf -ctd and vprmus superimpose well, indicating that these components do not undergo major conformational fluctuations and thus form a rigid platform for substrate binding and attachment of the crl stalk (fig ). however, rotation of ddb bpb around a hinge connecting it to bpc results in three different orientations of state- , - and - stalk regions relative to the core. bpb rotation angles were measured as ° between state- and - , and ° between state- and - . furthermore, the crosslinks between ddb and cul identified by clms are satisfied by the state- model, but increasingly violated in states- and - , validating in solution the conformational variability observed by cryo-em. (s c fig). taken together, this places the crl catalytic machinery, sited at the distal end (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . of the stalk, appropriately to approach the vprmus-tethered bulk of samhd for ubiquitylation at a wide range of angles (fig b). these data are in line with previous prediction based on extensive comparative crystal structure analyses, which postulated an approx. ° rotation of the crl stalk around the core [ , , , , ]. however, the left- and rightmost cul orientations observed here, states- and - from our cryo- em analysis, indicate a slightly narrower stalk rotation range ( °), when compared to the outermost stalk conformations modelled from previously determined crystal structures ( °) (s d fig). an explanation for this discrepancy comes from inspection of the cryo-em densities and fitted models, revealing that along with the main interaction interface on ddb bpb there are additional molecular contacts between cul and ddb . specifically, in state- , there is a contact between the loop connecting helices d and e of cul cullin repeat (cr) (residues - ) and a loop protruding from bp blade of ddb bpc (residues - , s e fig). in state- , the loop between cul cr helices d and e (residues - ) abuts a region in the c-terminal helical domain of ddb (residues - , s f fig). these auxiliary interactions might be required to lock the outermost stalk positions observed here in order to confine the rotation range of cul . discussion our x-ray crystallographic studies of the ddb /dcaf -ctd/vprmus assembly provide the first structural insight into a class of “hybrid” siv vpr proteins. these are present in the sivagm and sivmus/deb/syk lineages of lentiviruses and combine characteristics of related vprhiv- and siv vpx accessory proteins. like siv vpx, “hybrid” vpr proteins down-regulate the host restriction factor samhd by recruiting it to crl dcaf for ubiquitylation and subsequent proteasomal degradation. however, using a combination of x-ray, cryo-em and clms analyses, we show that the molecular strategy, which vprmus evolved to target samhd , is strikingly different from vpx-containing siv strains. in the two clades of vpx proteins, divergent amino acid sequence stretches just upstream of helix- (variable region (vr) , s a fig), together with polymorphisms in the samhd -n-terminus of the respective host species, determine if hiv- -type or sivmnd-type vpx recognise samhd -ctd or samhd -ntd, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . respectively. these recognition mechanisms result in positioning of samhd -ctd or -ntd on the side of the dcaf bp domain in a way that allows for additional contacts between samhd and dcaf , thus forming ternary vpx/samhd /dcaf assemblies with very low dissociation rates [ - , ]. in vprmus, different principles determine the specificity for samhd -ctd. here, vr is not involved in samhd -ctd-binding at all, but forms additional interactions with dcaf , which are not observed in vpx/dcaf protein complexes (s a fig). molecular contacts between vprmus and samhd are dispersed on helices- and - , facing away from the dcaf interaction site and immobilising samhd -ctd on the top side of the vprmus helix bundle (s a fig). placement of samhd -ctd in such a position precludes stabilising ternary interaction with dcaf -ctd, but still results in robust samhd ubiquitylation in vitro and samhd degradation in cell-based assays [ ]. predictions regarding the molecular mechanism of samhd -binding by other “hybrid” vpr orthologues are difficult due to sequence divergence. even in vprdeb, the closest relative to vprmus, only approximately % of amino acid side chains lining the putative samhd -ctd binding pocket are conserved (s a fig). previous in vitro ubiquitylation and cell-based degradation experiments did not show a clear preference of vprdeb for recruitment of either samhd -ntd or –ctd [ , ]. furthermore, it is disputed if vprdeb actually binds dcaf [ ], which might possibly be explained by amino acid variations in the very n-terminus and/or in helix- (s a fig). vprsyk is specific for samhd -ctd [ ], but the majority of residues forming the binding platform for samhd -ctd observed in the present study are not conserved. the sivagm lineage of vpr proteins is even more divergent, with significant differences not only in possible samhd -contacting residues, but also in the sequence stretches preceding helix- , and connecting helices- and - , as well as in the n-terminal half of helix- (s a fig). furthermore, there are indications that recruitment of samhd by the vpragm.gri sub-type involves molecular recognition of both samhd -ntd and –ctd [ , ]. in conclusion, recurring rounds of evolutionary lentiviral adaptation to the host samhd restriction factor, followed by host re-adaptation, resulted in highly species-specific, diverse molecular modes of vpr-samhd interaction. in addition to the example presented here, further structural characterisation of samhd -vpr complexes will be necessary to illustrate the manifold outcomes of this particular virus-host molecular “arms race”. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . previous structural investigation of ddb /dcaf /vprhiv- in complex with the neo-substrate ung demonstrated that vprhiv- engages ung by mimicking the dna phosphate backbone. more precisely, ung residues, which project into the major groove of its endogenous dna substrate, insert into a hydrophobic cleft formed by vprhiv- helices- , - and the n-terminal half of helix- [ ]. this mechanism might rationalise vprhiv- ’s extraordinary binding promiscuity, since the list of potential vprhiv- degradation substrates is significantly enriched in dna- and rna-binding proteins [ ]. moreover, promiscuous vprhiv- -induced degradation of host factors with dna- or rna-binding activity has been proposed to induce cell cycle arrest at the g /m phase border, which is the most thoroughly described phenotype of vpr proteins so far [ , , ]. in vprmus, the n-terminal half of helix- as well as the bulky amino acid residue w , which is also conserved in vpragm and vpx, constrict the hydrophobic cleft (s a, b fig). furthermore, the extended n-terminus of vprmus helix- is not compatible with ung -binding due to steric exclusion (s c fig). in accordance with these observations, vprmus does not down-regulate ung in a human t cell line [ ]. however, vprmus, vprsyk and vpragm also cause g /m cell cycle arrest in their respective host cells [ , , ]. this strongly hints at the existence of further structural determinants in vprmus, vprsyk, vpragm and potentially vprhiv- , which regulate recruitment and ubiquitylation of dna/rna-binding host factors, in addition to the hydrophobic, dna-mimicking cleft on top of the three-helix bundle. future efforts to structurally characterise these determinants will further extend our understanding of how the vpx/vpr helical scaffold binds, and in this way adapts to a multitude of neo-substrate epitopes. in addition, such efforts might inform approaches to design novel crl dcaf -based synthetic degraders, in the form of proteolysis-targeting chimera-(protac-) type compounds [ , ]. our cryo-em reconstructions of crl dcaf -ctd/vprmus/samhd , complemented by clms, also provide insights into the structural dynamics of crl assemblies prior to ubiquitin transfer. the data confirm previously described rotational movement of the crl stalk, in the absence of constraints imposed by a crystal lattice, creating a ubiquitylation zone around the vprmus-modified substrate receptor (figs and a) [ , , , , ]. missing density for the neddylated cul whb domain and for the catalytic roc ring domain indicates that these distal stalk elements are highly mobile and likely sample a multitude of orientations relative to the cul scaffold (fig b). these observations are in line (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . with structure analyses of crl and crl , where cul / neddylation leads to re-orientation of the cullin whb domain, and to release of the roc ring domain from the cullin scaffold, concomitant with stimulation of ubiquitylation activity [ ]. moreover, recent cryo-em structure analysis of crl β-trcp/iκbα demonstrated substantial mobility of pre-catalytic nedd -cul whb and roc ring domains [ ]. such flexibility seems necessary to structurally organise multiple crl -dependent processes, in particular the nucleation of a catalytic assembly, involving intricate protein-protein interactions between nedd , cul , ubiquitin-charged e and substrate receptor. this synergistic assembly then steers the ubiquitin c-terminus towards a substrate lysine for priming with ubiquitin [ ]. accordingly, our cryo-em studies might indicate that similar principles apply for crl -catalysed ubiquitylation. however, to unravel the catalytic architecture of crl , sophisticated cross-linking procedures as in reference ( ) will have to be pursued. intrinsic mobility of crl stalk elements might assist the accommodation of a variety of sizes and shapes of substrates in the crl ubiquitylation zone and might rationalise the wide substrate range accessible to crl ubiquitylation through multiple dcaf receptors. owing to selective pressure to counteract the host’s samhd restriction, hiv- and certain sivs, amongst other viruses, have taken advantage of this dynamic crl architecture by modification of the dcaf substrate receptor with vpx/vpr-family accessory proteins. by tethering either samhd -ctd or -ntd to dcaf , and in this way flexibly recruiting the bulk of samhd , the accessibility of lysine side chains both tether-proximal and on the samhd globular domains to the crl catalytic assembly might be further improved (fig c, d). this ensures efficient vpx/vpr-mediated samhd priming, poly-ubiquitylation and proteasomal degradation to stimulate virus replication. methods protein expression and purification constructs were pcr-amplified from cdna templates and inserted into the indicated expression plasmids using standard restriction enzyme methods (s table). pacghlt-b-ddb (plasmid # ) and pet -uba (plasmid # ) were obtained from addgene. the popc-uba -gst-appbp co-expression plasmid, and the pgex p -ubc plasmid were obtained from mrc-ppu reagents and (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . services (clones , ). bovine erythrocyte ubiquitin and recombinant hsnedd were purchased from sigma-aldrich (u ) and bostonbiochem (ul- ) respectively. point mutations were introduced by site-directed mutagenesis using kod polymerase (novagen). all constructs and variants are summarised s table. proteins expressed from vectors pacghlt-b, pgex p / , popc and pet b contained an n-terminal gst-his-tag; phissumo – n-terminal his-sumo-tag; pet , prsf-duet- – n-terminal his-tag; ptri-ex- – c-terminal his-tag. constructs in vectors pacghlt-b and ptri-ex- were expressed in sf cells, and constructs in vectors pet , pet b, pgex p / , prsf-duet- , and phissumo in e. coli rosetta (de ). recombinant baculoviruses (autographa californica nucleopolyhedrovirus clone c ) were generated as described previously [ ]. sf cells were cultured in insect-xpress medium (lonza) at °c in an innova r incubator shaker (new brunswick) at a shaking speed of rpm. in a typical preparation, l of sf cells at × cells/ml were co-infected with ml of high titre ddb virus and ml of high titre dcaf -ctd virus for h. for a typical e. coli rosetta (de ) expression, l of lb medium was inoculated with ml of an overnight culture and grown in a multitron ht incubator shaker (infors) at °c, rpm until od reached . . at that point, temperature was reduced to °c, protein expression was induced by addition of . mm iptg, and cultures were grown for further h. during co-expression of cul and roc from prsf-duet, µm zinc sulfate was added to the growth medium before induction. sf cells were pelleted by centrifugation at rpm, °c for min using a jla . centrifuge rotor (beckman). e. coli cells were pelleted by centrifugation at rpm, °c for min using the same rotor. cell pellets were resuspended in buffer containing mm tris, ph . , mm nacl, mm mgcl , . mm tris-( -carboxyethyl)-phosphine (tcep), mini-complete protease inhibitors ( tablet per ml) and mm imidazole (for his-tagged proteins only). ml of lysis buffer was used for resuspension of a pellet from l sf culture, and ml lysis buffer per pellet from l e. coli culture. before resuspension of cul /roc co-expression pellets, the buffer ph was adjusted to . . µl benzonase (merck) was added and the cells lysed by passing the suspension at least twice through a microfluidiser (microfluidics). lysates were clarified by centrifugation at xg for min at °c. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . protein purification was performed at °c on an Äkta pure fplc (ge) using xk / chromatography columns (ge) containing ml of the appropriate affinity resin. gst-tagged proteins were captured on glutathione-sepharose (gsh-sepharose ff, ge), washed with ml of wash buffer ( mm tris- hcl ph . , mm nacl, mm mgcl , . mm tcep), and eluted with the same buffer supplemented with mm reduced glutathione. his-tagged proteins were immobilised on ni-sepharose hp (ge), washed with ml of wash buffer supplemented with mm imidazole, and eluted with wash buffer containing . m imidazole. eluent fractions were analysed by sds-page, and appropriate fractions were pooled and reduced to ml using centrifugal filter devices (vivaspin). if applicable, µg gst- c protease, or µg thrombin, per mg total protein, was added and the sample was incubated for h on ice to cleave off affinity tags. as second purification step, gel filtration chromatography (gf) was performed on an Äkta prime plus fplc (ge), with superdex / columns (ge), equilibrated in mm tris-hcl ph . , mm nacl, mm mgcl , . mm tcep buffer, at a flow rate of ml/min. for purification of the cul /roc complex, the ph of all purification buffers was adjusted to . . peak fractions were analysed by sds-page, appropriate fractions were pooled and concentrated to approx. mg/ml, flash-frozen in liquid nitrogen in small aliquots and stored at - °c. protein concentrations were determined with a nanodrop spectrophotometer (nd , peqlab), using theoretical absorption coefficients calculated based upon the amino acid sequence by protparam on the expasy webserver [ ]. analytical gel filtration analysis prior to gel filtration analysis affinity tags were removed by incubation of µg gst- c protease with µm of each protein component in a volume of µl wash buffer, followed by incubation on ice for h. in order to remove the cleaved gst-tag and gst- c protease, μl gsh-sepharose ff beads (ge) were added and the sample was rotated at °c for one hour. gsh-sepharose beads were removed by centrifugation at °c, rpm for min, and µl of the supernatant was loaded on an analytical gf column (superdex / gl, ge), equilibrated in mm tris-hcl ph . , mm nacl, mm mgcl , . mm tcep, at a flow rate of . ml/min. ml fractions were collected and analysed by sds-page. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . in vitro ubiquitylation assays µl reactions were prepared, containing . µm substrate (indicated samhd constructs, s fig), . µm ddb /dcaf -ctd, . µm cul /roc , . µm hissumo-t l-vprmus (residues - ), . µm ubch c, µm ubiquitin in mm tris-hcl ph . , mm nacl, . mm mgcl , . mm atp. in control reactions, certain components were left out as indicated in s fig. a µl sample for sds-page analysis was taken (t= ). reactions were initiated by addition of . µm uba , incubated at °c, and µl sds-page samples were taken after min, min, min and min, immediately mixed with µl x sds sample buffer and boiled at °c for min. samples were analysed by sds-page. in vitro neddylation of cul /roc for initial neddylation tests, a µl reaction was prepared, containing µm cul /roc , . µm ubc , µm nedd in mm tris-hcl ph . , mm nacl, . mm mgcl , . mm atp. x µl samples were taken for sds-page, one was immediately mixed with µl x sds sample buffer, the other one incubated for min at °c. the reaction was initiated by addition of . µm appbp /uba , incubated at °c, and µl sds-page samples were taken after min, min, min, min and min, immediately mixed with µl x sds sample buffer and boiled at °c for min. samples were analysed by sds-page. based on this test, the reaction was scaled up to ml and incubated for min at °c. reaction was quenched by addition of mm tcep and immediately loaded onto a superdex / gf column (ge), equilibrated in mm tris-hcl ph . , mm nacl, mm mgcl , . mm tcep at a flow rate of ml/min. peak fractions were analysed by sds- page, appropriate fractions were pooled and concentrated to ~ mg/ml, flash-frozen in liquid nitrogen in small aliquots and stored at - °c. x-ray crystallography sample preparation, crystallisation, data collection and structure solution (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . ddb /dcaf -ctd complex. ddb /dcaf -ctd crystals were grown by the hanging drop vapour diffusion method, by mixing equal volumes ( µl) of ddb /dcaf -ctd solution at mg/ml with reservoir solution containing mm tri-na citrate ph . , % peg and suspending over a µl reservoir. crystals grew over night at °c. crystals were cryo-protected in reservoir solution supplemented with % glycerol and cryo-cooled in liquid nitrogen. a data set from a single crystal was collected at diamond light source (didcot, uk) at a wavelength of . Å. data were processed using xds [ ] (s table), and the structure was solved using molecular replacement with the program molrep [ ] and available structures of ddb (pdb e c) and dcaf -ctd (pdb cc ) [ ] as search models. iterative cycles of model adjustment with the program coot [ ], followed by refinement using the program phenix [ ] yielded final r/rfree factors of . %/ . % (s table). in the model, . % of residues have backbone dihedral angles in the favoured region of the ramachandran plot, the remainder fall in the allowed regions, and none are outliers. details of data collection and refinement statistics are presented in s table. coordinates and structure factors have been deposited in the pdb, accession number zue. ddb /dcaf -ctd/t l-vprmus ( - ) complex. the ddb /dcaf -ctd/vprmus complex was assembled by incubation of purified ddb /dcaf -ctd and hissumo-t l-vprmus (residues - ), at a : molar ratio, in a buffer containing mm bis-tris propane ph . , . m nacl, mm mgcl , . mm tcep, containing mg of hrv- c protease for hissumo-tag removal. after incubation on ice for h, the sample was loaded onto a superdex / gf column (ge), with a ml gsh- sepharose ff column (ge) connected in line. the column was equilibrated with mm bis-tris propane ph . , mm nacl, mm mgcl , and . mm tcep. the column flow rate was ml/min. gf fractions were analysed by sds-page, appropriate fractions were pooled and concentrated to . mg/ml. crystals were prepared by the sitting drop vapour diffusion method, by mixing equal volumes ( nl) of the protein complex at . mg/ml and reservoir solution containing - % peg (w/v), mm mgcl , mm hepes-naoh, ph . - . . the reservoir volume was µl. crystals grew after at least weeks of incubation at °c. crystals were cryo-protected in reservoir solution supplemented with % glycerol and cryo-cooled in liquid nitrogen. data sets from two single crystals were collected, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . initially at bessy ii (helmholtz-zentrum berlin, hzb) at a wavelength of . Å, and later at esrf (grenoble) at a wavelength of Å. data sets were processed separately using xds [ ] and xdsapp [ ]. the structure was solved by molecular replacement, using the initial bessy data set, with the program phaser [ ], and the following structures as search models: ddb /dcaf -ctd (this work) and t l variant e h (pdb qt ) [ ]. after optimisation of the initial model and refinement against the higher-resolution esrf data set, vprmus was placed manually into the density, using an nmr model of vprhiv- (pdb m l) [ ] as guidance. iterative cycles of model adjustment with the program coot [ ], followed by refinement using the program phenix [ ] yielded final r/rfree factors of . %/ . %. in the model, . % of residues have backbone dihedral angles in the favoured region of the ramachandran plot, the remainder fall in the allowed regions, and none are outliers. details of data collection and refinement statistics are presented in s table. coordinates and structure factors have been deposited in the pdb, accession number zx . cryo-em sample preparation and data collection complex assembly. purified cul -nedd /roc , ddb /dcaf -ctd, gst-vprmus and rhesus macaque samhd , µm each, were incubated in a final volume of ml of mm tris-hcl ph . , mm nacl, mm mgcl , . mm tcep, supplemented with mg of gst- c protease. after incubation on ice for h, the sample was loaded onto a superdex / gf column (ge), equilibrated with the same buffer at ml/min, with a ml gsh-sepharose ff column (ge) connected in line. gf fractions were analysed by sds-page, appropriate fractions were pooled and concentrated to . mg/ml. grid preparation. . µl protein solution containing . µm cul -nedd /roc /ddb /dcaf - ctd/vprmus/samhd complex and . µm ubch c-ubiquitin conjugate (s a, b fig) were applied to a mesh quantifoil r / cu/rh holey carbon grid (quantifoil micro tools gmbh) coated with an additional thin carbon film as sample support and stained with % uranyl acetate for initial characterisation. for cryo-em, a fresh mesh quantifoil r . / . cu holey carbon grid (quantifoil micro tools gmbh) was glow-discharged for s using a harrick plasma cleaner with technical air at . mbar and w. . µl protein solution containing . µm cul -nedd /roc /ddb /dcaf - (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . ctd/vprmus/samhd complex and µm ubch c-ubiquitin conjugate were applied to the grid, incubated for s, blotted with a vitrobot mark ii device (fei, thermo fisher scientific) for - s at °c and % humidity, and plunged in liquid ethane. grids were stored in liquid nitrogen until imaging. cryo-em data collection. initial negative stain and cryo-em datasets were collected automatically for sample quality control and low-resolution reconstructions on a kv tecnai spirit cryo-em (fei, thermo fisher scientific) equipped with a f cmos camera (tvips) using leginon [ , ]. particle images were then analysed by d classification and initial model reconstruction using sphire [ ], cistem [ ] and relion . [ ]. these data revealed the presence of the complexes containing both ddb /dcaf -ctd/vprmus (core) and cul /roc (stalk). high-resolution data was collected on a kv tecnai polara cryo-em (fei, thermo fisher scientific) equipped with a k summit direct electron detector (gatan) at a nominal magnification of x, with a pixel size of . Å/px on the object scale. in total, movie stacks were collected in super-resolution mode using leginon [ , ] with the following parameters: defocus range of . - . µm, frames per movie, s exposure time, electron dose of . e/Å /s and a cumulative dose of e/Å per movie. cryo-em computational analysis movies were aligned and dose-weighted using motioncor [ ] and initial estimation of the contrast transfer function (ctf) was performed with the ctffind package [ ]. resulting micrographs were manually inspected to exclude images with substantial contaminants (typically large protein aggregates or ice contaminations) or grid artefacts. power spectra were manually inspected to exclude images with astigmatic, weak, or poorly defined spectra. after these quality control steps the dataset included micrographs ( % of total). at this stage, the data set was picked twice and processed separately, to yield reconstructions of the core (analysis ) and states- , - and - (analysis ). for analysis , particle positions were determined using template matching with a filtered map comprising core and stalk using the software gautomatch (https://www .mrc- lmb.cam.ac.uk/research/locally-developed-software/zhang-software/). , particle images were found, extracted with relion . and subsequently d-classified using cryosparc [ ], resulting in , particle images after selection (s c, d fig). these particle images were separated into two (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . equally sized subsets and tier d-classification was performed using relion . on both of them to reduce computational burden (s d fig). the following parameters were used: initial model=“core”, number of classes k= , t= , global step search= . °, number of iterations= , pixel size . Å/px. from these, the ones possessing both core and stalk were selected. classes depicting a similar stalk orientation relative to the core were pooled and directed into tier as three different subpopulations containing , , , and , particle images, respectively (s d fig). for tier , each subpopulation was classified separately into classes each. from these classes, all particle images exhibiting well-defined densities for core and stalk were pooled and labelled “core+stalk”, resulting in , particle images in total. , particle images representing classes containing only the core were pooled and labelled “core” (s d fig) for tier , the “core” particle subset was separated into classes which yielded uninterpretable reconstructions lacking medium- or high-resolution features. the “core+stalk” subset was separated into classes, with classes containing both stalk and core (s d fig) and one class consisting only of the core with vprmus bound. the classes with stalk showed similar stalk orientations as the ones obtained from analysis (see below, s fig), but refined individually to lower resolution as in analysis and were discarded. however, individual refinement of the core-only tier class yielded a . Å reconstruction (s e, f fig). for analysis , particle positions were determined using cistems gaussian picking routine, yielding , particle images in total. after two rounds of d-classification, , particle images were selected for further processing (s g, h fig). using this data, an initial model was created using relion . . the resulting map yielded strong signal for the core but only fragmented stalk density, indicating a large heterogeneity in the stalk-region within the data set. this large degree of compositional (+/- stalk) and conformational heterogeneity (movement of the stalk relative to the core) made the classification challenging. accordingly, alignment and classification were carried out simultaneously. the first objective was to separate the data set into three categories: “junk”, “core” and “core+stalk”. therefore, the stalk was deleted from the initial model using the “eraser”-tool in chimera [ ]. this core-map was used as an initial model for the tier d-classification with relion . at a decimated pixel size of . Å/px. the following parameters were used: number of classes k= , t= , global step (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . search= . °, number of iterations= . the classification yielded two classes containing the stalk (classes and containing % and % of the particle images, respectively) (s h fig). these particles were pooled and directed into tier d-classification using the following parameters: number of classes k= , t= , global step search= . °, number of iterations= . three of these classes yielded medium- resolution maps with interpretable features (states- , - and - , s h fig). these three classes were refined individually using d relion . , resulting in maps with resolution ranging from . Å – . Å (s h-j fig). molecular visualisation, rigid body fitting, d structural alignments, rotation and interface analysis density maps and atomic models were visualised using coot [ ], pymol (schrödinger) and ucsf chimera [ ]. rigid body fits and structural alignments were performed using the program ucsf chimera [ ]. rotation angles between extreme ddb bpb domain positions were measured using the dyndom server [ ] (http://dyndom.cmp.uea.ac.uk/dyndom/rundyndom.jsp). molecular interfaces were analysed using the ebi pdbepisa server [ ] (https://www.ebi.ac.uk/msd-srv/prot_int/cgi- bin/piserver). multiple sequence alignment a multiple sequence alignment was calculated using the ebi clustalomega server [ ] (https://www.ebi.ac.uk/tools/msa/clustalo/), and adjusted manually using the program genedoc [ ]. cross-linking mass spectrometry (clms) complex assembly. purified cul /roc , ddb /dcaf -ctd, gst-vprmus and rhesus macaque samhd , µm each, were incubated in a volume of ml buffer containing mm hepes ph . , mm nacl, mm mgcl , . mm tcep, supplemented with mg gst- c protease. after incubation on ice for h, the sample was loaded onto a superdex / gf column (ge), equilibrated with the same buffer, at a flow rate of ml/min with a ml gsh-sepharose ff column (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . (ge) connected in line. gf fractions were analysed by sds-page, appropriate fractions were pooled and concentrated to mg/ml. photo-crosslinking. the cross-linker sulfo-sda (sulfosuccinimidyl , ′-azipentanoate) (thermo scientific) was dissolved in cross-linking buffer ( mm hepes ph . , mm nacl, mm mgcl , . mm tcep) to mm before use. the labelling step was performed by incubating μg aliquots of the complex at mg/ml with , , . , . , . mm sulfo-sda, added, respectively, for an hour. the samples were then irradiated with uv light at nm, to form cross- links, for min and quenched with mm nh hco for min. all steps were performed on ice. reaction products were separated on a novex bis-tris – % sds−page gel (life technologies). the gel band corresponding to the cross-linked complex was excised and digested with trypsin (thermo scientific pierce) [ ] and the resulting tryptic peptides were extracted and desalted using c stagetips [ ]. eluted peptides were fractionated on a superdex peptide . / increase column (ge healthcare) at a flow rate of µl/min using % (v/v) acetonitrile and . % (v/v) trifluoroacetic acid as mobile phase. μl fractions were collected and vacuum-dried. clms acquisition. samples for analysis were resuspended in . % (v/v) formic acid, . % (v/v) acetonitrile. lc-ms/ms analysis was performed on an orbitrap fusion lumos tribrid mass spectrometer (thermo fisher) coupled on-line with an ultimate rslcnano hplc system (dionex, thermo fisher). samples were separated on a cm easy-spray column (thermo fisher). mobile phase a consisted of . % (v/v) formic acid and mobile phase b of % (v/v) acetonitrile with . % (v/v) formic acid. flow rates were . μl/min using gradients optimized for each chromatographic fraction from offline fractionation, ranging from % mobile phase b to % mobile phase b over min. ms data were acquired in data-dependent mode using the top-speed setting with a s cycle time. for every cycle, the full scan mass spectrum was recorded using the orbitrap at a resolution of , in the range of to , m/z. ions with a precursor charge state between + and + were isolated and fragmented. analyte fragmentation was achieved by higher-energy collisional dissociation (hcd) [ ] and fragmentation spectra were then recorded in the orbitrap with a resolution of , . dynamic exclusion was enabled with single repeat count and s exclusion duration. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . clms processing. a recalibration of the precursor m/z was conducted based on high-confidence (< % false discovery rate (fdr)) linear peptide identifications. the re-calibrated peak lists were searched against the sequences and the reversed sequences (as decoys) of cross-linked peptides using the xi software suite (v. . . . ) for identification [ ]. final crosslink lists were compiled using the identified candidates filtered to < % fdr on link level with xifdr v. . [ ] imposing a minimum of % sequence coverage and observed fragments per peptide. clms analysis. in order to sample the accessible interaction volume of the samhd -ctd consistent with clms data, a model for samhd was generated using i-tasser [ ]. the samhd -ctd, which adopted a random coil configuration, was extracted from the model. in order to map all crosslinks, missing loops in the complex structure were generated using modeller [ ]. an interaction volume search was then submitted to the disvis webserver [ ] with an allowed distance between . Å and Å for each restraint using the "complete scanning" option. the rotational sampling interval was set to . ° and the grid voxel spacing to Å. the accessible interaction volume was visualised using ucsf chimera [ ]. acknowledgments we thank the mpi-mg for granting access to the tem instruments of the microscopy and cryo-em service group. we thank manfred weiss and the scientific staff of the bessy-mx (macromolecular x- ray crystallography)/helmholtz zentrum berlin für materialien und energie at beamlines bl . , bl . , and bl . operated by the joint berlin mx-laboratory at the bessy ii electron storage ring (berlin-adlershof, germany) as well as the scientific staff of the esrf (grenoble, france) at beamlines id a- , id b, id - , id - , and id for continuous support. we acknowledge diamond light source (didcot, uk) for access and support of the synchrotron beamline i and cryo-em facilities at the uk's national electron bio-imaging centre (ebic). furthermore, the authors acknowledge the north-german supercomputing alliance (hlrn) and the hpc for research cluster of the berlin institute of health for providing hpc resources. the phissumo plasmid was a generous gift from dr. evangelos christodoulou (the francis crick institute, uk). the rhesus macaque samhd cdna template was a generous gift from prof. michael emerman (fred hutchinson cancer research center, (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . seattle, usa). recombinant bac : ko bacmid was a generous gift from prof. ian jones (university of reading, uk). pacghlt-b-ddb was a gift from ning zheng (addgene plasmid ). pet -me was a gift from jorge eduardo azevedo (addgene plasmid ). data availability the coordinates and structure factors for the crystal structures have been deposited at the protein data bank (pdb) with the accession codes zue (ddb /dcaf -ctd) and zx (ddb /dcaf - ctd/t l-vprmus - ). cryo-em reconstructions have been deposited at the electron microscopy data bank (emdb) with the accession codes emd- (core), emd- (conformational state- ), emd- (state- ) and emd- (state- ). clms data have been deposited at the pride database [ ] with the accession code pxd , reviewer password fcrqg u . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . references . randow f, lehner pj. viral avoidance and exploitation of the ubiquitin system. nat cell biol. ; ( ): - . doi: . /ncb - . . isaacson mk, ploegh hl. ubiquitination, ubiquitin-like modifiers, and deubiquitination in viral infection. cell host & microbe. ; ( ): - . doi: . /j.chom. . . . . gustin jk, moses av, fruh k, douglas jl. viral takeover of the host ubiquitin system. front microbiol. ; : . doi: . /fmicb. . . . barry m, fruh k. viral modulators of cullin ring ubiquitin ligases: culling the host defense. science's stke : signal transduction knowledge environment. ; ( ):pe . epub / / . doi: . /stke. pe . . mahon c, krogan nj, craik cs, pick e. cullin e ligases and their rewiring by viral factors. biomolecules. ; ( ): - . epub / / . doi: . /biom . . becker t, le-trilling vtk, trilling m. cellular cullin ring ubiquitin ligases: druggable host dependency factors of cytomegaloviruses. int j mol sci. ; ( ). doi: . /ijms . . seissler t, marquet r, paillart jc. hijacking of the ubiquitin/proteasome pathway by the hiv auxiliary proteins. viruses. ; ( ). doi: . /v . . zheng n, shabek n. ubiquitin ligases: structure, function, and regulation. annu rev biochem. ; : . - . . sauter d, kirchhoff f. key viral adaptations preceding the aids pandemic. cell host & microbe. ; ( ): - . doi: . /j.chom. . . . . sharp pm, hahn bh. origins of hiv and the aids pandemic. cold spring harbor perspectives in medicine. ; ( ):a . epub / / . doi: . /cshperspect.a . . hatziioannou t, del prete gq, keele bf, estes jd, mcnatt mw, bitzegeio j, et al. hiv- -induced aids in monkeys. science. ; ( ): - . epub / / . doi: . /science. . . malim mh, bieniasz pd. hiv restriction factors and mechanisms of evasion. cold spring harbor perspectives in medicine. ; ( ):a . epub / / . doi: . /cshperspect.a . . fischer es, scrima a, bohm k, matsumoto s, lingaraju gm, faty m, et al. the molecular basis of crl ddb /csa ubiquitin ligase architecture, targeting, and activation. cell. ; ( ): - . epub / / . doi: . /j.cell. . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . lee j, zhou p. dcafs, the missing link of the cul -ddb ubiquitin ligase. molecular cell. ; ( ): - . epub / / . doi: . /j.molcel. . . . . angers s, li t, yi x, maccoss mj, moon rt, zheng n. molecular architecture and assembly of the ddb -cul a ubiquitin ligase machinery. nature. ; ( ): - . epub / / . doi: . /nature . . scrima a, konickova r, czyzewski bk, kawasaki y, jeffrey pd, groisman r, et al. structural basis of uv dna-damage recognition by the ddb -ddb complex. cell. ; ( ): - . epub / / . doi: . /j.cell. . . . . zimmerman es, schulman ba, zheng n. structural assembly of cullin-ring ubiquitin ligase complexes. current opinion in structural biology. ; ( ): - . epub / / . doi: . /j.sbi. . . . . andrejeva j, young df, goodbourn s, randall re. degradation of stat and stat by the v proteins of simian virus and human parainfluenza virus type , respectively: consequences for virus replication in the presence of alpha/beta and gamma interferons. journal of virology. ; ( ): - . doi: . /jvi. . . - . . . li t, chen x, garbutt kc, zhou p, zheng n. structure of ddb in complex with a paramyxovirus v protein: viral hijack of a propeller cluster in ubiquitin ligase. cell. ; ( ): - . epub / / . doi: . /j.cell. . . . . trilling m, le vt, fiedler m, zimmermann a, bleifuss e, hengel h. identification of dna-damage dna-binding protein as a conditional essential factor for cytomegalovirus replication in interferon-gamma- stimulated cells. plos pathogens. ; ( ):e . doi: . /journal.ppat. . . paradkar pn, duchemin jb, rodriguez-andres j, trinidad l, walker pj. cullin is pro-viral during west nile virus infection of culex mosquitoes. plos pathogens. ; ( ):e . doi: . /journal.ppat. . . decorsiere a, mueller h, van breugel pc, abdul f, gerossier l, beran rk, et al. hepatitis b virus x protein identifies the smc / complex as a host restriction factor. nature. ; ( ): - . doi: . /nature . . murphy cm, xu y, li f, nio k, reszka-blanco n, li x, et al. hepatitis b virus x protein promotes degradation of smc / to enhance hbv replication. cell reports. ; ( ): - . doi: . /j.celrep. . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . lim es, fregoso oi, mccoy co, matsen fa, malik hs, emerman m. the ability of primate lentiviruses to degrade the monocyte restriction factor samhd preceded the birth of the viral accessory protein vpx. cell host & microbe. ; ( ): - . epub / / . doi: . /j.chom. . . . . romani b, cohen ea. lentivirus vpr and vpx accessory proteins usurp the cullin -ddb (dcaf ) e ubiquitin ligase. current opinion in virology. ; ( ): - . epub / / . doi: . /j.coviro. . . . . fabryova h, strebel k. vpr and its cellular interaction partners: r we there yet? cells. ; ( ). doi: . /cells . . greenwood ejd, williamson jc, sienkiewicz a, naamati a, matheson nj, lehner pj. promiscuous targeting of cellular proteins by vpr drives systems-level proteomic remodeling in hiv- infection. cell reports. ; ( ): - e . doi: . /j.celrep. . . . . schrofelbauer b, yu q, zeitlin sg, landau nr. human immunodeficiency virus type vpr induces the degradation of the ung and smug uracil-dna glycosylases. journal of virology. ; ( ): - . doi: . /jvi. . . - . . . lahouassa h, blondot ml, chauveau l, chougui g, morel m, leduc m, et al. hiv- vpr degrades the hltf dna translocase in t cells and macrophages. proceedings of the national academy of sciences of the united states of america. ; ( ): - . doi: . /pnas. . . laguette n, bregnard c, hue p, basbous j, yatim a, larroque m, et al. premature activation of the slx complex by vpr promotes g /m arrest and escape from innate immune sensing. cell. ; ( - ): - . epub / / . doi: . /j.cell. . . . . zhou x, delucia m, ahn j. slx -slx protein-independent down-regulation of mus -eme protein by hiv- viral protein r (vpr). the journal of biological chemistry. ; ( ): - . doi: . /jbc.m . . . romani b, shaykh baygloo n, aghasadeghi mr, allahbakhshi e. hiv- vpr protein enhances proteasomal degradation of mcm dna replication factor through the cul -ddb [vprbp] e ubiquitin ligase to induce g /m cell cycle arrest. the journal of biological chemistry. ; ( ): - . doi: . /jbc.m . . . lv l, wang q, xu y, tsao lc, nakagawa t, guo h, et al. vpr targets tet for degradation by crl (vprbp) e ligase to sustain il- expression and enhance hiv- replication. molecular cell. ; ( ): - e . doi: . /j.molcel. . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . su j, rui y, lou m, yin l, xiong h, zhou z, et al. hiv- /siv vpx targets a novel functional domain of sting to selectively inhibit cgas-sting-mediated nf-kappab signalling. nat microbiol. ; ( ): - . doi: . /s - - - . . chougui g, munir-matloob s, matkovic r, martin mm, morel m, lahouassa h, et al. hiv- /siv viral protein x counteracts hush repressor complex. nat microbiol. ; ( ): - . doi: . /s - - - . . yurkovetskiy l, guney mh, kim k, goh sl, mccauley s, dauphin a, et al. primate immunodeficiency virus proteins vpx and vpr counteract transcriptional repression of proviruses by the hush complex. nat microbiol. ; ( ): - . doi: . /s - - -x. . hrecka k, hao c, gierszewska m, swanson sk, kesik-brodacka m, srivastava s, et al. vpx relieves inhibition of hiv- infection of macrophages mediated by the samhd protein. nature. ; ( ): - . epub / / . doi: . /nature . . laguette n, sobhian b, casartelli n, ringeard m, chable-bessia c, segeral e, et al. samhd is the dendritic- and myeloid-cell-specific hiv- restriction factor counteracted by vpx. nature. ; ( ): - . epub / / . doi: . /nature . . powell rd, holland pj, hollis t, perrino fw. aicardi-goutieres syndrome gene and hiv- restriction factor samhd is a dgtp-regulated deoxynucleotide triphosphohydrolase. the journal of biological chemistry. ; ( ): - . epub / / . doi: . /jbc.c . . . goldstone dc, ennis-adeniran v, hedden jj, groom hc, rice gi, christodoulou e, et al. hiv- restriction factor samhd is a deoxynucleoside triphosphate triphosphohydrolase. nature. ; ( ): - . epub / / . doi: . /nature . . zhu c, gao w, zhao k, qin x, zhang y, peng x, et al. structural insight into dgtp-dependent activation of tetrameric samhd deoxynucleoside triphosphate triphosphohydrolase. nature communications. ; : . epub / / . doi: . /ncomms . . kim b, nguyen la, daddacha w, hollenbaugh ja. tight interplay among samhd protein level, cellular dntp levels, and hiv- proviral dna synthesis kinetics in human primary monocyte-derived macrophages. the journal of biological chemistry. ; ( ): - . epub / / . doi: . /jbc.c . . . lahouassa h, daddacha w, hofmann h, ayinde d, logue ec, dragin l, et al. samhd restricts the replication of human immunodeficiency virus type by depleting the intracellular pool of deoxynucleoside triphosphates. nature immunology. ; ( ): - . epub / / . doi: . /ni. . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . st gelais c, de silva s, amie sm, coleman cm, hoy h, hollenbaugh ja, et al. samhd restricts hiv- infection in dendritic cells (dcs) by dntp depletion, but its expression in dcs and primary cd + t- lymphocytes cannot be upregulated by interferons. retrovirology. ; : . epub / / . doi: . / - - - . . rehwinkel j, maelfait j, bridgeman a, rigby r, hayward b, liberatore ra, et al. samhd - dependent retroviral control and escape in mice. the embo journal. ; ( ): - . epub / / . doi: . /emboj. . . . morris er, taylor ia. the missing link: allostery and catalysis in the anti-viral protein samhd . biochem soc trans. ; ( ): - . doi: . /bst . . baldauf hm, pan x, erikson e, schmidt s, daddacha w, burggraf m, et al. samhd restricts hiv- infection in resting cd (+) t cells. nature medicine. ; ( ): - . epub / / . doi: . /nm. . . shingai m, welbourn s, brenchley jm, acharya p, miyagi e, plishka rj, et al. the expression of functional vpx during pathogenic sivmac infections of rhesus macaques suppresses samhd in cd + memory t cells. plos pathogens. ; ( ):e . doi: . /journal.ppat. . . fregoso oi, ahn j, wang c, mehrens j, skowronski j, emerman m. evolutionary toggling of vpx/vpr specificity results in divergent recognition of the restriction factor samhd . plos pathogens. ; ( ):e . epub / / . doi: . /journal.ppat. . . schwefel d, groom hc, boucherit vc, christodoulou e, walker pa, stoye jp, et al. structural basis of lentiviral subversion of a cellular protein degradation pathway. nature. ; ( ): - . epub / / . doi: . /nature . . schwefel d, boucherit vc, christodoulou e, walker pa, stoye jp, bishop kn, et al. molecular determinants for recognition of divergent samhd proteins by the lentiviral accessory protein vpx. cell host & microbe. ; ( ): - . epub / / . doi: . /j.chom. . . . . wu y, koharudin lm, mehrens j, delucia m, byeon ch, byeon ij, et al. structural basis of clade- specific engagement of samhd (sterile alpha motif and histidine/aspartate-containing protein ) restriction factors by lentiviral viral protein x (vpx) virulence factors. the journal of biological chemistry. ; ( ): - . doi: . /jbc.m . . . spragg cj, emerman m. antagonism of samhd is actively maintained in natural infections of simian immunodeficiency virus. proceedings of the national academy of sciences of the united states of america. ; ( ): - . epub / / . doi: . /pnas. . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . wu y, zhou x, barnes co, delucia m, cohen ae, gronenborn am, et al. the ddb -dcaf -vpr- ung crystal structure reveals how hiv- vpr steers human ung toward destruction. nature structural & molecular biology. ; ( ): - . doi: . /nsmb. . . enchev ri, schulman ba, peter m. protein neddylation: beyond cullin-ring ligases. nature reviews molecular cell biology. ; ( ): - . doi: . /nrm . . schneider m, belsom a, rappsilber j. protein tertiary structure by crosslinking/mass spectrometry. trends in biochemical sciences. ; ( ): - . epub / / . doi: . /j.tibs. . . . . duda dm, borg la, scott dc, hunt hw, hammel m, schulman ba. structural insights into nedd activation of cullin-ring ligases: conformational control of conjugation. cell. ; ( ): - . doi: . /j.cell. . . . . fischer es, bohm k, lydeard jr, yang h, stadler mb, cavadini s, et al. structure of the ddb - crbn e ubiquitin ligase in complex with thalidomide. nature. ; ( ): - . doi: . /nature . . delucia m, mehrens j, wu y, ahn j. hiv- and sivmac accessory virulence factor vpx down- regulates samhd enzyme catalysis prior to proteasome-dependent degradation. the journal of biological chemistry. ; ( ): - . doi: . /jbc.m . . . berger g, lawrence m, hue s, neil sj. g /m cell cycle arrest correlates with primate lentiviral vpr interaction with the slx complex. journal of virology. . epub / / . doi: . /jvi. - . . guenzel ca, herate c, benichou s. hiv- vpr-a still "enigmatic multitasker". front microbiol. ; : . doi: . /fmicb. . . . stivahtis gl, soares ma, vodicka ma, hahn bh, emerman m. conservation and host specificity of vpr-mediated cell cycle arrest suggest a fundamental role in primate lentivirus evolution and biology. journal of virology. ; ( ): - . . planelles v, jowett jb, li qx, xie y, hahn b, chen is. vpr-induced cell cycle arrest is conserved among primate lentiviruses. journal of virology. ; ( ): - . . schapira m, calabrese mf, bullock an, crews cm. targeted protein degradation: expanding the toolbox. nat rev drug discov. ; ( ): - . doi: . /s - - -y. . hanzl a, winter ge. targeted protein degradation: current and future challenges. curr opin chem biol. ; : - . doi: . /j.cbpa. . . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . baek k, krist dt, prabu jr, hill s, klugel m, neumaier lm, et al. nedd nucleates a multivalent cullin-ring-ube d ubiquitin ligation assembly. nature. ; ( ): - . doi: . /s - - -y. . zhao y, chapman da, jones im. improving baculovirus recombination. nucleic acids research. ; ( ):e -. epub / / . . wilkins mr, gasteiger e, bairoch a, sanchez jc, williams kl, appel rd, et al. protein identification and analysis tools in the expasy server. methods mol biol. ; : - . epub / / . doi: . / - - - : . . kabsch w. xds. acta crystallographica section d, biological crystallography. ; (pt ): - . epub / / . doi: . /s . . vagin a, teplyakov a. molecular replacement with molrep. acta crystallographica section d, biological crystallography. ; (pt ): - . epub / / . doi: . /s . . emsley p, cowtan k. coot: model-building tools for molecular graphics. acta crystallographica section d, biological crystallography. ; (pt pt ): - . epub / / . doi: . /s . . liebschner d, afonine pv, baker ml, bunkoczi g, chen vb, croll ti, et al. macromolecular structure determination using x-rays, neutrons and electrons: recent developments in phenix. acta crystallogr d struct biol. ; (pt ): - . epub / / . doi: . /s . . sparta km, krug m, heinemann u, mueller u, weiss ms. xdsapp . . journal of applied crystallography. ; ( ): - . doi: doi: . /s . . mccoy aj, grosse-kunstleve rw, adams pd, winn md, storoni lc, read rj. phaser crystallographic software. journal of applied crystallography. ; ( ): - . doi: doi: . /s . . kuroki r, weaver lh, matthews bw. structural basis of the conversion of t lysozyme into a transglycosidase by reengineering the active site. proceedings of the national academy of sciences of the united states of america. ; ( ): - . epub / / . doi: . /pnas. . . . . morellet n, bouaziz s, petitjean p, roques bp. nmr structure of the hiv- regulatory protein vpr. journal of molecular biology. ; ( ): - . epub / / . . carragher b, kisseberth n, kriegman d, milligan ra, potter cs, pulokas j, et al. leginon: an automated system for acquisition of images from vitreous ice specimens. j struct biol. ; ( ): - . epub / / . doi: . /jsbi. . . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . suloway c, pulokas j, fellmann d, cheng a, guerra f, quispe j, et al. automated molecular microscopy: the new leginon system. j struct biol. ; ( ): - . epub / / . doi: . /j.jsb. . . . . moriya t, saur m, stabrin m, merino f, voicu h, huang z, et al. high-resolution single particle analysis from electron cryo-microscopy images using sphire. j vis exp. ;( ). epub / / . doi: . / . . grant t, rohou a, grigorieff n. cistem, user-friendly software for single-particle image processing. elife. ; . epub / / . doi: . /elife. . . zivanov j, nakane t, forsberg bo, kimanius d, hagen wj, lindahl e, et al. new tools for automated high-resolution cryo-em structure determination in relion- . elife. ; . epub / / . doi: . /elife. . . zheng sq, palovcak e, armache jp, verba ka, cheng y, agard da. motioncor : anisotropic correction of beam-induced motion for improved cryo-electron microscopy. nat methods. ; ( ): - . epub / / . doi: . /nmeth. . . mindell ja, grigorieff n. accurate determination of local defocus and specimen tilt in electron microscopy. j struct biol. ; ( ): - . epub / / . doi: . /s - ( ) - . . punjani a, rubinstein jl, fleet dj, brubaker ma. cryosparc: algorithms for rapid unsupervised cryo-em structure determination. nat methods. ; ( ): - . epub / / . doi: . /nmeth. . . pettersen ef, goddard td, huang cc, couch gs, greenblatt dm, meng ec, et al. ucsf chimera--a visualization system for exploratory research and analysis. j comput chem. ; ( ): - . epub / / . doi: . /jcc. . . hayward s, lee ra. improvements in the analysis of domain motions in proteins from conformational change: dyndom version . . j mol graph model. ; ( ): - . epub / / . doi: . /s - ( ) - . . krissinel e, henrick k. inference of macromolecular assemblies from crystalline state. journal of molecular biology. ; ( ): - . epub / / . doi: . /j.jmb. . . . . madeira f, park ym, lee j, buso n, gur t, madhusoodanan n, et al. the embl-ebi search and sequence analysis tools apis in . nucleic acids research. ; (w ):w -w . epub / / . doi: . /nar/gkz . . nicholas kb, nicholas jr., h. b., deerfield ii., d. w. genedoc: analysis and visualization of genetic variation. embnetnews. ; ( ): - . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . . shevchenko a, tomas h, havlis j, olsen jv, mann m. in-gel digestion for mass spectrometric characterization of proteins and proteomes. nature protocols. ; ( ): - . epub / / . doi: . /nprot. . . . rappsilber j, ishihama y, mann m. stop and go extraction tips for matrix-assisted laser desorption/ionization, nanoelectrospray, and lc/ms sample pretreatment in proteomics. anal chem. ; ( ): - . epub / / . doi: . /ac i. . kolbowski l, mendes ml, rappsilber j. optimizing the parameters governing the fragmentation of cross-linked peptides in a tribrid mass spectrometer. anal chem. ; ( ): - . epub / / . doi: . /acs.analchem. b . . mendes ml, fischer l, chen za, barbon m, o'reilly fj, giese sh, et al. an integrated workflow for crosslinking mass spectrometry. mol syst biol. ; ( ):e . epub / / . doi: . /msb. . . fischer l, rappsilber j. quirks of error estimation in cross-linking/mass spectrometry. anal chem. ; ( ): - . epub / / . doi: . /acs.analchem. b . . yang j, zhang y. protein structure and function prediction using i-tasser. curr protoc bioinformatics. ; : - . epub / / . doi: . / .bi s . . webb b, sali a. comparative protein structure modeling using modeller. curr protoc bioinformatics. ; : - . epub / / . doi: . /cpbi. . . van zundert gc, trellet m, schaarschmidt j, kurkcuoglu z, david m, verlato m, et al. the disvis and powerfit web servers: explorative and integrative modeling of biomolecular complexes. journal of molecular biology. ; ( ): - . epub / / . doi: . /j.jmb. . . . . perez-riverol y, csordas a, bai j, bernal-llinares m, hewapathirana s, kundu dj, et al. the pride database and related tools and resources in : improving support for quantification data. nucleic acids research. ; (d ):d -d . epub / / . doi: . /nar/gky . (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . figures fig . biochemical analysis of vprmus-induced crl dcaf specificity redirection. (a) gf analysis of in vitro reconstitution of protein complexes containing ddb /dcaf -ctd, vprmus and samhd constructs. a schematic of the samhd constructs is shown above the chromatograms. sam – sterile α-motif domain, hd – histidine-aspartate domain, t l – t lysozyme. (b) sds-page analysis of fractions collected during gf runs in a, boxes are colour-coded with respect to the chromatograms. note that during preparation of the gf run containing samhd -Δctd (green trace), the gst-affinity tag, which forms dimers in solution, was not removed completely from ddb . accordingly, the gf trace contains an additional dimeric gst-ddb /dcaf -ctd/vprmus component in fractions - . (c-f) in vitro ubiquitylation reactions with purified protein components in the absence (c) or presence (d-f) of vprmus, with the indicated samhd constructs as substrate. reactions were stopped after the indicated times, separated on sds-page and visualised by staining. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig . crystal structure of the ddb /dcaf -ctd/vprmus complex. (a) overall structure of the complex in two views. dcaf -ctd is shown as grey cartoon and semi- transparent surface. vprmus is shown as a dark green cartoon with the co-ordinated zinc ion shown as grey sphere. t l and ddb have been omitted for clarity. (b) superposition of apo-dcaf -ctd (light blue cartoon) with vprmus-bound dcaf -ctd (grey/green cartoon). only dcaf -ctd regions with significant structural differences between apo- and vprmus-bound forms are shown. disordered loops are indicated as dashed lines. (c) comparison of the binary vprmus/dcaf -ctd and ternary vpxsm/dcaf - ctd/samhd -ctd complexes. for dcaf -ctd, only the n-terminal “acidic loop” region is shown. vprmus, dcaf -ctd and bound zinc are coloured as in a; vpxsm is represented as orange cartoon and samhd -ctd as pink cartoon. selected vpr/vpx/dcaf -ctd side chains are shown as sticks, and electrostatic interactions between these side chains are indicated as dotted lines. (d) in vitro reconstitution of protein complexes containing ddb /dcaf -ctd/vprmus or the vprmus r e/r e (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . mutant, and samhd , analysed by analytical gf. sds-page analysis of corresponding gf fractions is shown next to the chromatogram. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig . mechanism of samhd -ctd recruitment by vprmus. (a) two views of the cryo-em reconstruction of the crl -nedd dcaf -ctd/vprmus/samhd core. the crystal structure of the ddb /dcaf -ctd/vprmus complex was fitted as a rigid body into the cryo- em density and is shown in the same colours as in fig a. the ddb bpb model and density was removed for clarity. the red arrows mark additional density on the upper surface of the vprmus helix bundle. (b) schematic representation of sulfo-sda cross-links (grey lines) between crl dcaf /vprmus and samhd , identified by clms. proteins are colour-coded as in a, cul is coloured orange, samhd black/white. samhd -ctd is highlighted in red, and cross-links to samhd -ctd are highlighted in violet. (c) the accessible interaction space of samhd -ctd, calculated by the disvis server [ ], consistent with at least of observed cross-links, is visualised as grey mesh. dcaf - ctd and vprmus are oriented and coloured as in a. (d) detailed view of the samhd -ctd electron density. the model is in the same orientation as in a, left panel. selected vprmus residues w and a , which are in close contact to the additional density, are shown as red space-fill representation. (e) in (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . vitro reconstitution of protein complexes containing ddb /dcaf -ctd, vprmus or the vprmus w a/a w mutant, and samhd , assessed by analytical gf. sds-page analysis of corresponding gf fractions is shown below the chromatogram. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig . variability of neo-substrate recognition in vpx/vpr proteins. comparison of neo-substrate recognition modes of vprmus (a), vpxsm (b), vpxmnd (c) and vprhiv- (d) proteins. dcaf -ctd is shown as grey cartoon and semi-transparent surface, vprmus – green, vpxsm – orange, vpxmnd – blue and vprhiv- – light brown are shown as cartoon. models of the recruited ubiquitylation substrates are shown as strongly filtered, semi-transparent calculated electron density maps with the following colouring scheme: samhd -ctd bound to vprmus – yellow, samhd -ctd (bound to vpxsm, pdb cc ) [ ] – mint green, samhd -ntd (vpxmnd , pdb aja) [ ] – magenta, ung (vprhiv- , pdb jk ) [ ] – light violet. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig . cryo-em analysis of crl -nedd dcaf -ctd conformational states. (a) two views of an overlay of crl -nedd dcaf -ctd/vprmus/samhd cryo-em reconstructions (conformational state- – light green, state- – salmon, state- – purple). the portions of the densities corresponding to ddb bpa/bpc, dcaf -ctd and vprmus have been superimposed. (b) two views of a superposition of ddb /dcaf -ctd/vprmus and cul /roc (pdb hye) [ ] molecular models, which have been fitted as rigid bodies to the corresponding cryo-em densities; the models are oriented as in a. ddb /dcaf -ctd/vprmus is shown as in fig a, cul is shown as cartoon, coloured as in a (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . and roc is shown as cyan cartoon. cryo-em density corresponding to samhd -ctd is shown in yellow, to illustrate the samhd -ctd binding site in the context of the whole crl assembly. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . fig . schematic illustration of structural plasticity in vprmus-modified crl dcaf -ctd, and implications for ubiquitin transfer. (a) rotation of the crl stalk increases the space accessible to catalytic elements at the distal tip of the stalk, forming a ubiquitylation zone around the core. (b) modification of cul -whb with nedd leads to increased mobility of these distal stalk elements (cul -whb, roc ring domain) [ ], further extending the ubiquitylation zone and activating the formation of a catalytic assembly for ubiquitin transfer (see also d) [ ]. (c) flexible tethering of samhd to the core by vprmus places the bulk of samhd in the ubiquitylation zone and optimises surface accessibility. (d) dynamic processes a-c together create numerous possibilities for assembly of the catalytic machinery (nedd -cul - whb, roc , ubiquitin-(ubi-)charged e ) on surface-exposed samhd lysine side chains. here, three of these possibilities are exemplified schematically. in this way, ubiquitin coverage on samhd is maximised. (which was not certified by peer review) is the author/funder. all rights reserved. no reuse allowed without permission. the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . a genome wide copper-sensitized screen identifies novel regulators of mitochondrial cytochrome c oxidase activity genetic regulators of mitochondrial copper a genome wide copper-sensitized screen identifies novel regulators of mitochondrial cytochrome c oxidase activity natalie m. garza , aaron t. griffin , , mohammad zulkifli , chenxi qiu , , craig d. kaplan , , vishal m. gohil* department of biochemistry and biophysics, ms , texas a&m university, college station, tx , usa present address: department of systems biology, columbia university, new york, ny , usa present address: department of medicine, division of translational therapeutics, beth israel deaconess medical center, harvard medical school, boston, ma , usa present address: department of biological sciences, university of pittsburgh, pittsburgh, pa , usa *to whom the correspondence should be addressed: vishal m. gohil, old main drive, ms , texas a&m university, college station, tx usa; email: vgohil@tamu.edu; tel: ( ) - ; fax: ( ) - running title: genetic regulators of mitochondrial copper keywords: copper, mitochondria, vacuole, cytochrome c oxidase, ph, ap- , rim , rim .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper abstract copper is essential for the activity and stability of cytochrome c oxidase (cco), the terminal enzyme of the mitochondrial respiratory chain. loss-of-function mutations in genes required for copper transport to cco result in fatal human disorders. despite the fundamental importance of copper in mitochondrial and organismal physiology, systematic characterization of genes that regulate mitochondrial copper homeostasis is lacking. to identify genes required for mitochondrial copper homeostasis, we performed a genome-wide copper- sensitized screen using dna barcoded yeast deletion library. our screen recovered a number of genes known to be involved in cellular copper homeostasis while revealing genes previously not linked to mitochondrial copper biology. these newly identified genes include the subunits of the adaptor protein complex (ap- ) and components of the cellular ph-sensing pathway- rim and rim , both of which are known to affect vacuolar function. we find that ap- and the rim mutants impact mitochondrial cco function by maintaining vacuolar acidity. cco activity of these mutants could be rescued by either restoring vacuolar ph or by supplementing growth media with additional copper. consistent with these genetic data, pharmacological inhibition of the vacuolar proton pump leads to decreased mitochondrial copper content and a concomitant decrease in cco abundance and activity. taken together, our study uncovered a number of novel genetic regulators of mitochondrial copper homeostasis and provided a mechanism by which vacuolar ph impacts mitochondrial respiration through copper homeostasis. introduction copper is an essential trace metal that serves as a cofactor for a number of enzymes in various biochemical processes, including mitochondrial bioenergetics ( ). for example, copper is essential for the activity of cytochrome c oxidase (cco), the evolutionarily conserved enzyme of the mitochondrial respiratory chain and the main site of cellular respiration ( ). cco metalation requires transport of copper to mitochondria followed by its insertion into cox and cox , the two copper-containing subunits of cco ( ). genetic defects that prevent copper delivery to cco disrupt its assembly and activity resulting in rare but fatal infantile disorders ( , , ). intracellular trafficking of copper poses a challenge because of the high reactivity of this transition metal. copper in an aqueous environment of the cell can generate deleterious reactive oxygen species via fenton chemistry ( ) and can inactivate other metalloproteins by mismetallation ( ). consequently, organisms must tightly control copper import and trafficking to subcellular compartments to ensure proper cuproprotein biogenesis while preventing its toxicity. indeed, aerobic organisms have evolved highly conserved proteins to import and distribute copper to cuproenzymes in cells ( ). extracellular copper is imported by plasma membrane copper transporters and is immediately bound to metallochaperones atx and ccs for its delivery to different cuproenzymes residing in the golgi and cytosol, respectively ( ). however, copper transport to the mitochondria is not well understood. a non- proteinaceous ligand, whose molecular identity remains unknown, has been proposed to transport cytosolic copper to the mitochondria ( ), where it is stored in the matrix ( ). this mitochondrial matrix pool of copper is the main source of copper ions that are delivered to cco subunits in a particularly complex process requiring multiple metallochaperones and thiol reductases ( , , ). specifically, copper from the mitochondrial matrix is exported to the intermembrane space via a yet .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper unidentified transporter, where it is inserted into the cco subunits by metallochaperones cox , sco , and cox that operate in a bucket-brigade manner ( ). the copper- transporting function of metallochaperones requires disulfide reductase activities of sco and coa , respectively ( , ). in addition to the mitochondria, vacuoles in yeast and vacuole-like lysosomes in higher eukaryotes have been identified as critical storage sites and regulators of cellular copper homeostasis ( - ). copper enters the vacuole by an unknown mechanism and is proposed to be stored as cu(ii) coordinated to polyphosphate ( ). depending on the cellular requirement, vacuolar copper is reduced to cu(i), allowing its mobilization and export through ctr ( , ). currently, the complete set of factors regulating the distribution of copper to mitochondria remains unknown. here, we sought to identify regulators of mitochondrial copper homeostasis by exploiting the copper requirement of cco in a genome-wide screen using a barcoded yeast deletion library. our screen was motivated by prior observations that respiratory growth of yeast mutants such as coa Δ can be rescued by copper supplementation in the media ( - ). thus, we designed a copper-sensitized screen to identify yeast mutants whose growth can be rescued by addition of copper in the media. our screen recovered coa and other genes with known roles in copper metabolism while uncovering genes involved in vacuolar function as regulators of mitochondrial copper homeostasis. here, we have highlighted the roles of two cellular pathways - adaptor protein complex (ap- ) and the ph-sensing pathway rim – that converge on vacuolar function as important factors regulating cco biogenesis by maintaining mitochondrial copper homeostasis. results a genome-wide copper-sensitized screen using barcoded yeast deletion mutant library we chose the yeast, saccharomyces cerevisiae, to screen for genes that impact mitochondrial copper homeostasis because it can tolerate mutations that inactivate mitochondrial respiration by surviving on glycolysis. this enables the discovery of novel regulators of mitochondrial copper metabolism whose knockout is expected to result in a defect in aerobic energy generation ( ). yeast cultured in glucose- containing media (ypd) uses glycolytic fermentation as the primary source for cellular energy, however in glycerol/ethanol- containing non-fermentable media (ypge), yeast must utilize the mitochondrial respiratory chain and its terminal cuproenzyme, cco, for energy production. based on the nutrient-dependent utilization of different energy-generating pathways, we expect that deletion of genes required for respiratory growth will specifically reduce growth in non-fermentable (ypge) medium but will not impair growth of those mutants in fermentable (ypd) medium. moreover, if respiratory deficiency in yeast mutants is caused by defective copper delivery to mitochondria, then these mutants may be amenable to rescue via copper supplementation in ypge respiratory growth media (fig. ). therefore, to identify genes required for copper-dependent respiratory growth, we cultured the yeast deletion mutants in ypd and ypge with or without μm cucl supplementation (fig. ). our genome-wide yeast deletion mutant library was derived from the variomics library reported previously ( ). it is composed of viable haploid yeast mutants, where each mutant has one nonessential gene replaced with the selection marker kanmx and two unique flanking sequences (fig. ). these flanking sequences labeled “up” and “dn” contains universal priming sites as well as a -bp .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper barcode sequence that is specific to each deletion strain. this unique barcode sequence allows for the quantification of individual strain relative abundance within a pool of competitively grown strains by dna barcode sequencing ( ). here, we utilized this dna barcode sequencing approach to quantify the relative fitness of each mutant grown in ypd and ypge ± cu to early stationary phase (fig. ). genes required for respiratory growth we began the screen by identifying mutant strains with respiratory deficiency since perturbation of mitochondrial copper metabolism is expected to compromise aerobic energy metabolism. to identify mutants with this growth phenotype, we compared the relative abundance of each barcode in ypd to that of ypge using t- score based on welch’s t-test. t-score provides a quantitative measure of the difference in the abundance of a given mutant in two growth conditions. a negative t score identifies mutants that grow poorly in respiratory conditions; conversely, a positive t score identifies mutants with better competitive growth in respiratory conditions. we rank ordered all the mutants from negative to positive t scores and found that the lower tail of the distribution was enriched in genes with known roles in respiratory chain function as expected (fig. a; supplementary table ). the top “hits” representing mutants with most negative t score included coq , cox a, rcf , coa , and pet genes that are involved in coenzyme q and respiratory complex iv function (fig. a). to more systematically identify cellular pathways that were enriched for reduced respiratory growth, we performed gene ontology analysis using an online tool - gene ontology enrichment analysis and visualization (gorilla) ( ). the gene ontology (go) analysis identified mitochondrial respiratory chain complex assembly (p-value: . e- ) and cytochrome oxidase assembly (p-value: . e- ) as the top-scoring biological process categories (fig. b) and mitochondrial part (p-value: . e- ) and mitochondrial inner membrane (p-value: . e- ) as the top-scoring molecular components category (fig. c). this unbiased analysis identified the expected pathways and processes validating our screening results. we further benchmarked the performance of our screen by determining the enrichment of genes encoding for mitochondria-localized and oxidative phosphorylation (oxphos) proteins at three different p-value thresholds (p< . , p< . , and p< . ) (supplementary fig. ). we observed that at a p-value of < . , ~ % of the genes encoded for mitochondrially localized proteins, of which ~ % oxphos proteins (supplementary fig. ; supplementary table ). the percentage of mitochondria- localized and oxphos genes increased progressively as we increased the stringency of our analysis by decreasing the significance cut-off from p-value of . to . (supplementary fig. ). a total of genes were identified to have respiratory deficient growth at p< . , of which are known to encode mitochondrial proteins ( ), nearly half of these are oxphos proteins from a total of known oxphos genes in yeast (supplementary fig. ; supplementary table ). expectedly, the respiratory deficient mutants included genes required for mitochondrial nadh dehydrogenase (ndi ) and oxphos complex ii, iii, iv, and v as well as genes involved in cytochrome c and ubiquinone biogenesis, which together forms mitochondrial energy generating machinery (fig. d, supplementary table ). additionally, genes encoding tca cycle enzymes and mitochondrial translation were also scored as hits (supplementary fig. ). surprisingly, a large fraction of genes required for respiratory growth encoded non-mitochondrial proteins involved in vesicle-mediated transport (supplementary fig. ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper pathway analysis for copper-based rescue next, we focused on identifying mutants in which copper supplementation improved their fitness in respiratory growth conditions by comparing their abundance in ypge + μm cucl versus ypge growth conditions. we rank ordered the genes from positive to negative t scores. mutants with positive t score are present in the upper tail of the distribution that displayed improved respiratory growth upon copper supplementation (fig. a, supplementary table ). notably, several genes known to be involved in copper homeostasis were recovered as high scoring “hits” in our screen and were present in the expected upper tail of distribution (fig. a). for example, we recovered ctr , which encodes the plasma membrane copper transporter ( ), atx , which encodes a metallochaperone involved in copper trafficking to the golgi body ( ), gef and kha which encodes proteins involved in copper loading into the cuproproteins in the golgi compartment ( , ), gsh and gsh which are required for biosynthesis of copper-binding molecule glutathione, and coa , which encodes a mitochondrial protein that we previously discovered to have a role in copper delivery to the mitochondrial cco ( , , ) (fig. a). nevertheless, for many of our other top scoring hits, evidence supporting their role in mitochondrial copper homeostasis was either limited or lacking entirely. to determine which cellular pathways are essential for maintaining copper homeostasis, we performed gene ontology analysis using gorilla. go analysis identified biological processes - golgi to vacuole transport (p-value: . e- ), and post-golgi vesicle-mediated transport, (p- value: . e- ) as the most significantly enriched pathways (fig. b). additionally, go category transition metal ion homeostasis - was also in the top five significantly enriched pathways, (p-value: . e- ) (fig. b). go analysis for cellular component categories identified adaptor protein complex (ap- ), which is known to transport vesicles from the golgi body to vacuole, as the top scoring cellular component (p-value: . e- ) (fig. c). all four subunits of ap- complex (apl , apm , apl , aps ) complex were in the top of our rank list (fig. a, supplementary table ) ( , ). additionally, two subunits of the rim pathway (rim and rim ), both of which are linked to vacuolar function ( ), were also in our list of top-scoring genes (supplementary table ). of note, the seven major components of the rim pathway were identified as top-scoring hits for respiratory deficient growth (supplementary fig. ). placing the hits from our screen on cellular pathways revealed a number of “hits” that were either involved in golgi bud formation (sys , arf ), vesicle coating (ap- and ap- complex subunits), tethering and fusion of golgi vesicle cargo to the vacuole (vam ), and vacuolar atpase expression and assembly (rim , rim , rav ) (fig. d). we reasoned that these biological processes and cellular components were likely high scoring due to the role of the vacuole as a major storage site of intracellular metals ( ). we decided to focus on ap- and rim mutants, as these cellular components were not previously linked to mitochondrial respiration or mitochondrial copper homeostasis. ap- mutants exhibit reduced abundance of cco and v-atpase subunits to validate our screening results and to determine the specificity of the copper- based rescue of ap- mutants, we compared the respiratory growth of ap- deletion strains, aps Δ, apl Δ, and apl Δ on ypd and ypge media with or without cu, mg, zn supplementation. each of the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper ap- mutants exhibited reduced respiratory growth in ypge media at °c, which was fully restored by copper, but not by magnesium or zinc (fig. a), indicating that the primary defect in these cells is dysregulated copper homeostasis. here we used °c for growth measurement to fully uncover growth defect on solid media. the coa Δ mutant was used as a positive control because we have previously shown that respiratory growth deficiency of coa Δ can be rescued by cu supplementation ( ). since recent work has identified the role of the yeast vacuole in mitochondrial iron homeostasis ( , ) we asked if iron supplementation could also rescue the respiratory growth of ap- mutants. unlike copper, which rescued respiratory growth of ap- mutants at μm concentration, low concentrations of iron (≤ μm) did not rescue respiratory growth; but we did find that high iron supplementation ( μm) improved their respiratory growth (supplementary fig. ). to uncover the biochemical basis of reduced respiratory growth, we focused on cox , a copper- containing subunit of cco, whose stability is dependent on copper availability and whose levels serve as a reliable proxy for mitochondrial copper content. the steady state levels of cox were modestly but consistently reduced in all four ap- mutants tested (fig. b). ap- complex function has not been directly linked to mitochondria but is linked to the trafficking of proteins from the golgi body to the vacuole. therefore, the decreased abundance of cox in ap- mutants could be due to an indirect effect involving the vesicular trafficking role of the ap- complex. a previous study has shown that the ap- complex interacts with a subunit of the v-atpase in human cells ( ). as perturbation in v-atpase function had been linked to defective respiratory growth ( - ), we wondered if ap- impacts mitochondrial function via trafficking v- atpase subunit(s) to the vacuole. to test this idea, we first measured vacuolar acidification and found that the ap- mutant, aps Δ, exhibited significantly increased vacuolar ph (fig. c). we hypothesized that the elevated vacuolar ph of aps Δ cells could be due to a perturbation in the trafficking of v-atpase subunit(s). to test this possibility, we measured the levels of v-atpase subunit vma , in wild type (wt) and aps Δ cells, by western blotting and found that vma levels were indeed reduced in the isolated vacuolar fractions of aps Δ cells but were unaffected in the whole cells (fig. d). the decreased abundance of vma in vacuoles of yeast ap- mutant explains decreased vacuolar acidification because vma is an essential subunit of v-atpase. taken together, these results suggest that the ap- complex is required for maintaining vacuolar acidification, which in turn could impact mitochondrial copper homeostasis. genetic defects in rim pathway perturbs mitochondrial copper homeostasis next, we focused on two other hits from the screen, rim and rim , which are the members of the rim pathway that has been previously linked to the v-atpase expression ( - ). the activation of rim results in the increased expression of v-atpase subunits ( ). consistently, we found elevated vacuolar ph in rim Δ cells (fig. a). we then compared the respiratory growth of rim Δ and rim Δ on ypd and ypge media with or without cu, mg, or zn supplementation. consistent with our screening results, these mutants exhibited reduced respiratory growth that was fully restored by copper but not magnesium or zinc (fig. b). to directly test the roles of these genes in cellular copper homeostasis, we measured the whole-cell copper levels of rim Δ by inductively coupled plasma mass spectrometry (icp-ms). the intracellular copper levels under basal or .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper copper-supplemented conditions in rim Δ cells were comparable to the wt cells, suggesting that the copper import or sensing machinery is not defective in this mutant (fig. c). in contrast to the total cellular copper levels, rim Δ did exhibit significantly reduced mitochondrial copper levels, which were restored by copper supplementation (fig. d). the decrease in mitochondrial copper levels is expected to perturb the biogenesis of cco in rim Δ cells. therefore, we measured the abundance and activity of this complex by western blot analysis and enzymatic assay, respectively. consistent with the decrease in mitochondrial copper levels, rim Δ cells exhibited a reduction in the abundance of cox along with a decrease in cco activity, both of which were rescued by copper supplementation (fig. e and f). to further dissect the compartment-specific effect by which rim impacts cellular copper homeostasis, we measured the abundance and activity of sod , a mainly cytosolic cuproenzyme. we found that unlike cco, sod abundance and activity remain unchanged in rim Δ cells (supplementary fig. ). to determine if the decrease in cco activity in the absence of rim was due to its role in maintaining vacuolar ph, we manipulated vacuolar ph by changing the ph of the growth media. previously, it has been shown that vacuolar ph is influenced by the ph of the growth media through endocytosis ( , ). indeed, acidifying growth media to ph . from the basal ph of . normalized vacuolar ph of rim Δ to the wt levels and both strains exhibited lower vacuolar ph when grown in acidified media (fig. g). under these conditions of reduced vacuolar ph, the respiratory growth of rim Δ was restored to wt levels (fig. h). notably, alkaline media also reduced the respiratory growth of wt cells, though the extent of growth reduction was lower than rim Δ, which is likely because of a fully functional v-atpase in wt cells (fig. h). to uncover the biochemical basis of the restoration of respiratory growth of rim Δ by acidified media, we measured cco enzymatic activity in wt and rim Δ cells grown in either basal or acidified growth medium (ph . and . ), respectively. consistent with the respiratory growth rescue, the cco activity was also restored in cells grown at an ambient ph of . (fig. i). notably, the restoration of respiratory growth by copper supplementation was independent of growth media ph (fig. j). taken together, these findings causally links vacuolar ph to cco activity via mitochondrial copper homeostasis. pharmacological inhibition of the v- atpase results in decreased mitochondrial copper to directly assess the role of vacuolar ph in maintaining mitochondrial copper homeostasis, we utilized concanamycin a (conca), a small molecule inhibitor of v- atpase. treating wt cells with increasing concentrations of conca led to progressively increased vacuolar ph (fig. a). notably, the increase in vacuolar ph with pharmacological inhibition of v-atpase by conca was much more pronounced (fig. a) than via genetic perturbation in aps Δ or rim Δ cells (figs. c and a). correspondingly, we observed a pronounced decrease in cco abundance and activity in conca treated cells (fig. b, c). this decrease in abundance of cco is likely due to a reduction in mitochondrial copper levels (fig. d). this data establishes the role of the vacuole in regulating mitochondrial copper homeostasis and cco function. discussion mitochondria are the major intracellular copper storage sites that harbor important cuproenzymes like cco. when faced with copper deficiency, cells prioritize .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper mitochondrial copper homeostasis suggesting its critical requirement for this organelle ( ). however, the complete set of factors required for mitochondrial copper homeostasis has not been identified. here, we report a number of novel genetic regulators of mitochondrial copper homeostasis that link mitochondrial bioenergetic function with vacuolar ph. specifically, we show that when vacuolar ph is perturbed by genetic, environmental, or pharmacological factors, then copper availability to the mitochondria is limited, which in turn reduces cco function and impairs aerobic growth and mitochondrial respiration. it has been known for a long time that v- atpase mutants have severely reduced respiratory growth ( , ) and more recent high-throughput studies have corroborated these observations ( - ). however, the molecular mechanisms underlying this observation have remained obscure. recent studies have shown that a decrease in vacuolar acidity (i.e. increased vacuolar ph) perturbs cellular and mitochondrial iron homeostasis, which impairs mitochondrial respiration, as iron is also required for electron transport through the mitochondrial respiratory chain due to its role in iron-sulfur cluster biogenesis and heme biosynthesis ( , , , ). in an elegant series of experiments, hughes et al, showed that when v-atpase activity is compromised, there is an elevation in cytosolic amino acids because vacuoles with defective ph are unable to import and store amino acids. the resulting elevation in cytosolic amino acids, particularly cysteine, are toxic to the cells by disrupting cellular iron homeostasis and iron-dependent mitochondrial respiration ( ). although this exciting study took us a step closer to our understanding of v-atpase-dependent mitochondrial function, the mechanism by which elevated cysteine perturbs iron homeostasis is still unclear. since cysteine can strongly bind cuprous ions ( , ) its sequestration in cytosol by cysteine would decrease its availability to fet , a multi-copper oxidase required for the uptake of extracellular iron, which in turn would aggravate iron deficiency ( ). thus, a defect in cellular copper homeostasis could cause a secondary defect in iron homeostasis. consistent with this idea, we observed a rescue of ap- mutants’ respiratory growth with high iron supplementation (supplementary fig. ). interestingly, ap- has also been previously linked to vacuolar cysteine homeostasis ( ). our results showing diminished cco activity and/or cox levels in ap- , rim , and conca-treated cells (figs. b, e and f, b and c) connects vacuolar ph to mitochondrial copper biology. however, a modest decrease in cco activity may not be sufficient to reduce respiratory growth. therefore, it is very likely that the decreased respiratory growth we have observed is a result of a defect not only in copper but also in iron homeostasis. consistent with this idea, previous high throughput studies reported sensitivity of ap- and rim pathway mutants in conditions of iron deficiency and overload ( , ). moreover, rim and rim mutants have been shown to display sensitivity to copper starvation in cryptococcus neoformans, an opportunistic fungal pathogen ( ) and partial knockdown of ap s , a subunit of ap- complex in zebrafish, sensitized developing melanocytes to hypopigmentation in low-copper environmental conditions ( ). thus, the rim pathway and the ap- pathway is linked to copper homeostasis in multiple organisms. our discovery of ap- pathway mutants and other mutants involved in the golgi-to-vacuole transport (fig. ) is also consistent with a previous genome-wide study, which identified the involvement of these genes in cu-dependent growth of yeast saccharomyces cerevisiae ( ), however, the biochemical mechanism(s) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper underlying the functional connection between the vacuole and mitochondrial cco has not been previously elucidated. thus, the results from our study are not only consistent with previous studies but also provide a biochemical mechanism elucidating how disruption in vacuolar ph perturbs mitochondrial respiratory function via copper-dependence of cco. interestingly, in both the genetic and pharmacological models of reduced v- atpase function, mitochondrial copper levels were reduced (fig. d and fig. d) but were not absent, suggesting that the vacuole may only partially contribute to mitochondrial cu homeostasis. supporting this hypothesis, rescue of respiratory growth by copper supplementation was successful irrespective of vacuolar ph (fig. a and j). the results of this study could also provide insights into mechanisms underlying the pathogenesis of human diseases associated with aberrant copper metabolism and/or decreased v-atpase function including alzheimer’s disease, amyotrophic lateral sclerosis (als), and parkinson’s disease ( - ). although multiple factors are known to contribute to the pathogenesis of these diseases, our study suggests disrupted mitochondrial copper homeostasis may also be an important contributing factor. in contrast to these multi-factorial diseases, pathogenic mutations in ap- subunits are known to cause hermansky- pudlak syndrome (hps), a rare autosomal disorder, which is often associated with high morbidity ( - ). just as in yeast, ap- in humans is required for the transport of vesicles to the lysosome, which is evolutionarily and functionally related to the yeast vacuole. our study linking ap- to mitochondrial function suggests that decreased mitochondrial function could contribute to hps pathology. more generally, decreased activity of v-atpase has been linked to age-related decrease in lysosomal function ( , , ) and impaired acidification of yeast vacuole has been shown to cause accelerated aging ( ). therefore, in addition to uncovering the fundamental aspects of cell biology of metal transport and distribution, our study suggests a possible role of mitochondrial dysfunction in multiple human disorders. methods yeast strains and growth conditions individual yeast saccharomyces cerevisiae mutants used in this study were obtained from open biosystems or were constructed by one-step gene disruption using a hygromycin cassette ( ). all strains used in this study are listed in table . authenticity of yeast strains was confirmed by polymerase chain reaction (pcr)-based genotyping. yeast cells were cultured in either ypd ( % yeast extract, % peptone, and % dextrose) or ypge ( % glycerol + % ethanol) medium. solid ypd and ypge media were prepared by addition of % agar. for metal supplementation experiments, growth medium was supplemented with divalent chloride salts of cu, mn, mg, zn or feso . for growth on solid media, μl of -fold serial dilutions of pre-cultures were seeded onto ypd or ypge plates and incubated at °c for the indicated period. for growth in the liquid medium, yeast cells were pre-cultured in ypd and inoculated into ypge and grown to mid-log phase. to acidify or alkalinize liquid ypge, equivalents of hcl or naoh were added, respectively. liquid growth assays in acidified or alkalinized ypge, cultures were grown for h before comparing growth. for growth in the presence of concanamycin a (conca), cells were first cultured in ypd, transferred to ypge allowed to grow for h, then conca was added and allowed to grow further for h. growth in liquid media was monitored spectrophotometrically at nm. construction of yeast deletion pool the yeast deletion collection for bar-seq analysis was derived from the variomics .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper library constructed previously ( ) and was a kind gift of xuewen pan. the heterozygous diploid deletion library was sporulated and selected in liquid haploid selection medium (sc-arg-his- leu+g +canavanine) to obtain haploid cells containing gene deletions. to do this, we followed previously described protocol ( ) with the following modification of adding uracil to allow the growth of deletion library lacking ura . prior to sporulation, the library pool was grown under conditions to first allow loss of ura plasmids and then subsequent selection for cells lacking ura plasmids. original deletion libraries were initially constructed where each yeast open reading frame was replaced with kanmx cassette containing two gene specific barcode sequence referred to as the up tag and the dn tag since they are located upstream and downstream of the cassette ( ), respectively. pooled growth assays a stored glycerol stock of the haploid deletion pool containing . x cells/ml (equivalent of . optical density/ml) was thawed and approximately μl was used to inoculate ml of ypd, ypge or ypge + μm cucl media in quadruplicates in ml falcon tube at a starting optical density of . , which corresponded to ~ . x cells/ml. the cells were grown at °c in an incubator shaker at rpm till they reached an optical density of ~ . before harvesting. cells were pelleted by centrifugation at ×g for min and washed once with sterile water and stored at - °c. frozen cell pellets were thawed and resuspended in sterile nanopure water and counted. genomic dna was extracted from x cells using yeastar genomic dna kit (catalog no.d ) from zymo research. the extracted dna was used as a template to amplify barcode sequence by pcr, followed by purification of amplified dna by qiaquick pcr purification kit from qiagen. the number of pcr cycles used for amplification was determined by quantitative real time pcr such that barcode sequences were not amplified in a nonlinear way. the amplified up and dn barcode dna were purified by gel electrophoresis and sequenced on illumina hiseq with base pair, paired-end sequencing at genomics and bioinformatics service of texas a&m agrilife research. assessing fitness of barcoded yeast strains by dna sequencing. the sequencing reads were aligned to the barcode sequences using bowtie (version . . ) with the -n flag set to . bowtie outputs were processed and counted using samtools (version . . ). barcode sequences shorter than nts or were mapped to multiple reference barcodes were discarded. we noted that the dn tag sequences were missing for many genes and therefore we only used up tag sequences to calculate the fitness score using t statistics. gene ontology analysis to identify enriched gene ontology terms, we generated a rank ordered list based on t-scores (supplementary table and ) and used the reference genome for saccharomyces cerevisiae in gorilla (http://cbl-gorilla.cs.technion.ac.il/). cellular and mitochondrial copper measurements cellular and mitochondrial copper levels were measured by inductively coupled plasma (icp) mass spectrometry using nexion d instrument from perkinelmer inc. briefly, intact yeast cells were washed twice with ultrapure metal-free water containing μm edta (traceselect; sigma) followed by two more washes with ultrapure water to eliminate edta. for mitochondrial samples, the same procedure was performed using mm mannitol (traceselect; sigma) to maintain mitochondrial integrity. after washing, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper samples were weighed, digested with % nitric acid (traceselect; sigma) at °c for h, followed by h digestion with . % h o (sigma-supelco), then diluted in ultrapure water and analyzed. copper standard solutions were prepared by diluting commercially available mixed metal standards (bdh aristar plus). subcellular fractionation whole-cell lysates were prepared by resuspending ~ mg of yeast cells in μl sumeb buffer ( . % sodium dodecyl sulfate, m urea, mm mops, ph . , mm edta, mm phenylmethanesulfonyl fluoride [pmsf] and x edta-free protease inhibitor cocktail from roche) containing mg of acid-washed glass beads (sigma-aldrich). samples were then placed in a bead beater (mini bead beater from biospec products), which was set at maximum speed. the bead beating protocol involved five rounds, where each round lasted for s followed by s incubation on ice. lysed cells were kept on ice for min, then heated at °c for min. cell debris and glass beads were spun down at , ×g for min at °c. the supernatant was transferred to a separate tube and was used to perform sds- page/western blotting. mitochondria were isolated as described previously ( ). briefly, . - . g of cell pellet was incubated in dtt buffer ( . m tris-hcl, ph . , mm dtt) at °c for min. the cells were then pelleted by centrifugation at , ×g for min, resuspended in spheroplasting buffer ( . m sorbitol, mm potassium phosphate, ph . ) at ml/g and treated with mg zymolyase (us biological life sciences) per gram of cell pellet for min at °c. spheroplasts were pelleted by centrifugation at , ×g for min then homogenized in homogenization buffer ( . m sorbitol, mm tris-hcl, ph . , mm edta, mm pmsf, . % [w/v] bsa [essentially fatty acid-free, sigma-aldrich]) with strokes using a glass teflon homogenizer with pestle b. after two centrifugation steps for min at , ×g and , ×g, the final supernatant was centrifuged at , ×g for min to pellet mitochondria. mitochondria were resuspended in sem buffer ( mm sucrose, mm edta, mm mops-koh, ph . , containing x protease inhibitor cocktail from roche). isolation of pure vacuoles was performed as previously described ( ). yeast spheroplasts were pelleted at , ×g at °c for min. dextran-mediated spheroplast lysis of g of yeast cells was performed by gently resuspending the pellet in . ml of % (w/v) ficoll in ficoll buffer ( mm pipes/koh, mm sorbitol, ph . , mm pmsf, x protease inhibitor cocktail) followed by addition of μl of . mg/ml dextran in ficoll buffer. the mixture was incubated on ice for min followed by heating at °c for s and returning the samples to ice. a step-ficoll gradient was constructed on top of the lysate with ml each of %, %, and % (w/v) ficoll in ficoll buffer. the step- gradient was centrifuged at , ×g for min at °c. vacuoles were removed from the %/ % ficoll interface. crude cytosolic fractions used to quantify sod activity and abundance were isolated as described previously ( ). briefly, ~ mg of yeast cells were resuspended in μl of solubilization buffer ( mm potassium phosphate, ph . , mm pmsf, mm edta, x protease inhibitor cocktail, % [w/v] triton x- ) for min on ice. the lysate was extracted by centrifugation at , ×g for min at °c, to remove the insoluble fraction. protein concentrations for all cellular fractions were determined by the bca assay (thermo scientific). sds-page and western blotting for sds-polyacrylamide gel electrophoresis (sds-page)/western blotting experiments, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper μg of protein was loaded for either whole cell lysate or mitochondrial samples, while μg of protein was used for cytosolic and vacuolar fractions. proteins were separated on either - % stain-free gels (bio-rad) or % nupage bis-tris mini protein gels (thermofisher scientific) and blotted onto a polyvinylidene difluoride membranes. membranes were blocked for h in % (w/v) nonfat milk dissolved in tris-buffered saline with . % (w/v) tween (tbst- milk), followed by overnight incubation with a primary antibody in tbst-milk or tbst- % bovine serum albumin at °c. primary antibodies were used at the following dilutions: cox , : , (abcam ); por , : , (abcam ); pgk , : , (life technologies ), sod , : , , and vma , : , (sigma h ). secondary antibodies (ge healthcare) were used at : , for h at room temperature. membranes were developed using western lightning plus- ecl (perkinelmer), or supersignal west femto (thermofisher scientific). enzymatic activities to measure sod activity, we used an in-gel assay as described previously, ( ). μg of cytosolic protein was diluted in nativepage sample buffer (thermofisher scientific) and separated onto a - % nativepage gel (thermofisher scientific) at °c. the gel was then stained with . % (w/v) nitroblue tetrazolium, . % riboflavin for min in the dark. this solution was then replaced by % tetramethylethylenediamine for min and developed under a bright light. the gel was imaged by bio-rad chemidoctm mp imaging system and densitometric analysis was performed using image lab software. cco and citrate synthase enzymatic activities were measured as described previously ( ) using a biotek’s synergy™ mx microplate reader in a clear well plate (falcon). to measure cco activity, µg of mitochondria were resuspended in µl of cco buffer ( mm sucrose, mm potassium phosphate, ph . , mg/ml bsa) and allowed to incubate for min. the reaction was started by the addition of µl of μm oxidized cytochrome c (equine heart, sigma) and . µl of % (w/v) n- dodecyl-beta-d-maltoside. oxidation of cytochrome c was monitored at nm for min, then the reaction was inhibited by the addition of µl of mm kcn. to measure citrate synthase activity, µg of mitochondria were resuspended in µl of citrate synthase buffer ( mm tris-hcl ph . , . % [w/v] triton x- , µm , '-dithio-bis-[ -nitrobenzoic acid]) and µl of mm acetyl-coa and incubated for min. to start the reaction, µl of mm oxaloacetate was added and turn-over of acetyl-coa was monitored at nm for min. enzyme activity was normalized to that of wt for each replicate. measuring vacuolar ph vacuolar ph was measured using a ratiometric ph indicator dye, bcecf-am ( ′, ′-bis-( -carboxyethyl)- -(and- )- carboxyfluorescein [life technologies]) as described by ( ) using a biotek’s synergy™ mx microplate reader. briefly, mg of cells were resuspended in µl of ypge containing µm bcecf-am for min shaking at °c. to remove extracellular bcecf-am, cells were washed twice and resuspended in µl of fresh ypge. µl of this cell culture was added to ml of mm mes buffer, ph . or . . the fluorescence emission intensity at nm was monitored by using the excitation wavelengths and nm in a black well plate, clear bottom (falcon). a calibration curve of the fluorescence intensity in response to ph was carried out as described ( ). statistics t-scores for each pairwise media comparison (e.g. ypd vs. ypge) were .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper calculated using welch’s two-sample t-test for yeast knockout barcode abundance values normalized for sample sequencing depth (i.e. counts per million). statistical analysis on bar charts was conducted using two sided students t-test. experiments were performed in three biological replicates, where biological replicates are defined as experiments performed on different days and different starting pre- culture. error bars represent the standard deviation, *(p< . ), **(p< . ), ***(p< . ). references . kim, b. e., nevitt, t., and thiele, d. j. ( ) mechanisms for copper acquisition, distribution, and regulation. nat. chem. biol. , - . little, a. g., lau, g., mathers, k. e., leary, s. c., and moyes, c. d. ( ) comparative biochemistry of cytochrome c oxidase in animals. comp. biochem. physiol. b. biochem. mol. biol. , - . cobine, p. a., moore, s. a., and leary, s. c. ( ) getting out what you put in: copper in mitochondria and its impacts on human disease. biochim. biophys. acta. mol. cell. res. , . baertling, f., van den brand, m. m. a., hertecant, j. l., al-shamsi, a., van den heuvel, l. p., distelmaier, f., mayatepek, e., smeitink, j. a., nijtmans, l. g., and rodenburg, r. j. ( ) mutations in coa cause cytochrome c oxidase deficiency and neonatal hypertrophic cardiomyopathy. hum. mutat. , - . papadopoulou, l. c., sue, c. m., davidson, m. m., tanji, k., nishino, i., sadlock, j. e., krishna, s., walker, w., selby, j., glerum, d. m., coster, r. v., lyon, g., scalais, e., lebel, r., kaplan, p., shanske, s., de vivo, d. c., bonilla, e., hirano, m., dimauro, s., and schon, e. a. ( ) fatal infantile cardioencephalomyopathy with cox deficiency and mutations in sco , a cox assembly gene. nat. genet. , - . valnot, i., osmond, s., gigarel, n., mehaye, b., amiel, j., cormier-daire, v., munnich, a., bonnefont, j. p., rustin, p., and rötig, a. ( ) mutations of the sco gene in mitochondrial cytochrome c oxidase deficiency with neonatal-onset hepatic failure and encephalopathy. am. j. hum. genet. , - . halliwell, b., and gutteridge, j. m. ( ) oxygen toxicity, oxygen radicals, transition metals and disease. biochem. j. , - . foster, a. w., dainty, s. j., patterson, c. j., pohl, e., blackburn, h., wilson, c., hess, c. r., rutherford, j. c., quaranta, l., corran, a., and robinson, n. j. ( ) a chemical potentiator of copper-accumulation used to investigate the iron-regulons of saccharomyces cerevisiae. mol. microbiol. , - . nevitt, t., ohrvik, h., and thiele, d. j. ( ) charting the travels of copper in eukaryotes from yeast to mammals. biochim. biophys. acta. , - . robinson, n. j., and winge, d. r. ( ) copper metallochaperones. annu. rev. biochem. , - . cobine, p. a., ojeda, l. d., rigby, k. m., and winge, d. r. ( ) yeast contain a non- proteinaceous pool of copper in the mitochondrial matrix. j. biol. chem. , - . cobine, p. a., pierrel, f., bestwick, m. l., and winge, d. r. ( ) mitochondrial matrix copper complex used in metallation of cytochrome oxidase and superoxide dismutase. j. biol. chem. , - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper . timón-gómez, a., nývltová, e., abriata, l. a., vila, a. j., hosler, j., and barrientos a. ( ) mitochondrial cytochrome c oxidase biogenesis: recent developments. semin. cell. dev. biol. , - . leary, s. c., sasarman, f., nishimura, t., and shoubridge, e. a. ( ) human sco is required for the synthesis of co ii and as a thiol-disulphide oxidoreductase for sco . hum. mol. genet. , - . soma, s., morgada, m. n., naik, m. t., boulet, a., roesler, a. a., dziuba, n., ghosh, a., yu, q., lindahl, p. a., ames, j. b., leary, s. c., vila, a. j., and gohil, v. m. ( ) coa is structurally tuned to function as a thiol-disulfide oxidoreductase in copper delivery to mitochondrial cytochrome c oxidase. cell rep. , - . blaby-haas, c. e., and merchant, s. s. ( ) lysosome-related organelles as mediators of metal homeostasis. j. biol. chem. , - . polishchuck, e. v., and polishchuk, r. s. ( ) the emerging role of lysosomes in copper homeostasis. metallomics. , - . portnoy, m. e., schmidt, p. j., rogers, r. s., and culotta, v. c. ( ) metal transporters that contribute copper to metallochaperones in saccharomyces cerevisiae. mol. genet. genomics. , - . nguyen, t. q., dziuba, n., and lindahl, p. a. ( ) isolated saccharomyces cerevisiae vacuoles contain low-molecular-mass transition-metal polyphosphate complexes. metallomics. , - . . rees, e. m., lee, j., and thiele, d. j. ( ) mobilization of intracellular copper stores by the ctr vacuolar copper transporter, j biol chem. , - . rees, e. m., and thiele, d. j. ( ) identification of a vacuole associated metalloreductase and its role in ctr -mediated intracellular copper mobilization, j. biol. chem. , - . wu x, kim h, seravalli j, barycki jj, hart pj, gohara dw, di cera e, jung wh, kosman dj, lee j. potassium and the k+/h+ exchanger kha p promote binding of copper to apofet p multi-copper ferroxidase. j biol chem. apr ; ( ): - . ghosh, a., trivedi, p. p., timbalia, s. a., griffin, a. t., rahn, j. j., chan, s. s., and gohil, v. m. ( ) copper supplementation restores cytochrome c oxidase assembly defect in a mitochondrial disease model of coa deficiency. hum. mol. genet. , - . glerum, d. m., shtanko, a., and tzagoloff, a. ( ) characterization of cox , a yeast gene involved in copper metabolism and assembly of cytochrome oxidase. j biol chem. , - . diaz-ruiz, r., uribe-carvajal, s., devin, a., and rigoulet, m. ( ) tumor cell energy metabolism and its common features with yeast metabolism. biochim. biophys. acta. , - . huang, z., chen, k., zhang, j., li, y., wang, h., cui, d., tang, j., liu, y., shi, x., li, w., liu, d., chen, r., sucgang, r. s., and pan, x. ( ) a functional variomics tool for discovering drug-resistance genes and drug targets. cell. rep. , - . smith, a. m., heisler, l. e., mellor, j., kaper, f., thompson, m. j., chee, m., roth, f. p., giaever, g., and nislow, c. ( ) quantitative phenotyping via deep barcode sequencing. genome. res. , - . eden, e., navon, r., steinfeld, i., lipson, d., and yakhini, z. ( ) gorilla: a tool for discovery and visualization of enriched go terms in ranked gene lists. b. m. c. bioinformatics. , .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper . vögtle fn, burkhart jm, gonczarowska-jorge h, kücükköse c, taskin aa, kopczynski d, ahrends r, mossmann d, sickmann a, zahedi rp, meisinger c. landscape of submitochondrial protein distribution. nat commun. aug ; ( ): . . dancis, a., yuan, d. s., haile, d., askwith, c., eide, d., moehle, c., kaplan, j., and klausner, r. d. ( ) molecular characterization of a copper transport protein in s. cerevisiae: an unexpected role for copper in iron transport. cell. , - . lin, s. j., and culotta, v. c. ( ) the atx gene of saccharomyces cerevisiae encodes a small metal homeostasis factor that protects cells against reactive oxygen toxicity. proc. natl. acad. sci. u. s. a. , - . gaxiola, r. a., yuan, d. s., klausner, r. d., and fink, g. r. ( ) the yeast clc chloride channel functions in cation homeostasis. proc. natl. acad. sci. u. s. a. , - . ghosh, a., pratt, a. t., soma, s., theriault, s. g., griffin, a. t., trivedi, p. p., and gohil, v. m. ( ) mitochondrial disease genes coa , cox b, and sco have overlapping roles in cox biogenesis. hum. mol. genet. , - . bagh, m.b., peng, s., chandra, g., zhang, z., singh, s. p., pattabiraman, n., liu, a., and mukherjee, a.b. ( ) misrouting of v-atpase subunit v a dysregulates lysosomal acidification in a neurodegenerative lysosomal storage disease model. nat. commun. : . dell'angelica, e. c. ( ) ap- -dependent trafficking and disease: the first decade. curr. opin. cell. biol. , - . lamb, t. m., xu, w., diamond, a., and mitchell, a. p. ( ) alkaline response genes of saccharomyces cerevisiae and their relationship to the rim pathway. j. biol. chem. , - . chen, k. l., ven, t. n., crane, m. m., brunner, m. l. c., pun, a. k., helget, k. l., brower, k., chen, d. e., doan, h., dillard-telm, j. d., huynh, e., feng, y. c., yan, z., golubeva, a., hsu, r. a., knight, r., levin, j., mobasher, v., muir, m., omokehinde, v., screws, c., tunali, e., tran, r. k., valdez, l., yang, e., kennedy, s. r., herr, a. j., kaeberlein, m., and wasko, b. m. ( ) loss of vacuolar acidity results in iron-sulfur cluster defects and divergent homeostatic responses during aging in saccharomyces cerevisiae. geroscience. , - . hughes, c. e., coody, t. k., jeong, m. y., berg, j. a., winge, d. r., and hughes, a. l. ( ) cysteine toxicity drives age-related mitochondrial decline by altering iron homeostasis. cell. , - . ohya, y., umemoto, n., tanida, i., ohta, a., iida, h., and anraku, y. calcium-sensitive cls mutants of saccharomyces cerevisiae showing a pet- phenotype are ascribable to defects of vacuolar membrane h(+)-atpase activity. j. biol. chem. , - . eide, d. j., bridgham, j. t., zhao, z., and james, m. r. ( ) the vacuolar h+- atpase of saccharomyces cerevisiae is required for efficient copper detoxification, mitochondrial function, and iron metabolism. mol. gen. genet. , - . hughes, a. l., and gottschling, d. e. ( ) an early age increase in vacuolar ph limits mitochondrial function and lifespan in yeast. nature. , - . maeda, t. ( ) the signaling mechanism of ambient ph sensing and adaptation in yeast and fungi. febs. j. , - . pérez-sampietry, m., and herrero, e. ( ) the pacc-family protein rim prevents selenite toxicity in saccharomyces cerevisiae by controlling vacuolar acidification. fungal. genet. biol. , - .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper . xu, w., smith, f. j., subaran, r., and mitchell, a. p. ( ) multivesicular body-escrt components function in ph response regulation in saccharomyces cerevisiae and candida albicans. mol biol cell. , - . brett, c. l., kallay, l., hua, z., green, r., chyou, a., zhang, y., graham, t. r., donowitz, m., and rao, r. ( ) genome-wide analysis reveals the vacuolar ph-stat of saccharomyces cerevisiae. plos. one. , e . orij, r., urbanus, m. l., vizeacoumar, f. j., giaever, g., boone, c., nislow, c., brul, s., and smits, g. j. ( ) genome-wide analysis of intracellular ph reveals quantitative control of cell division rate by ph(c) in saccharomyces cerevisiae. genome. biol. , r . dodani, s. c., leary, s. c., cobine, p. a., winge, d. r., and chang, c. j. ( ) a targetable fluorescent sensor reveals that copper-deficient sco and sco patient cells prioritize mitochondrial copper homeostasis. j. am. chem. soc. , - . merz, s., and westermann, b. ( ) genome-wide deletion mutant analysis reveals genes required for respiratory growth, mitochondrial genome maintenance and mitochondrial protein synthesis in saccharomyces cerevisiae. genome. biol. , r . schlecht, u., suresh, s., xu, w., aparicio, a. m., chu, a., proctor, m. j., davis, r. w., scharfe, c., and st onge, r. p. ( ) a functional screen for copper homeostasis genes identifies a pharmacologically tractable cellular system. b. m. c. genomics. , . stenger, m., le, d. t., klecker, t., and westermann, b. ( ) systematic analysis of nuclear gene function in respiratory growth and expression of the mitochondrial genome in s. cerevisiae. microb. cell. , - . weber, r. a., yen, f. s., nicholson, s. p. v., alwaseem, h., bayraktar, e. c., alam, m., timson, r. c., la, k., abu-remaileh, m., molina, h., and birsoy, k. ( ) maintaining iron homeostasis is the key role of lysosomal acidity for cell proliferation. mol. cell. , - . yambire, k. f., rostosky, c., watanabe, t., pacheu-grau, d., torres-odio, s., sanchez-guerrero, a., senderovich, o., meyron-holtz, e. g., milosevic, i., frahm, j., west, a. p., and raimundo, n. ( ) impaired lysosomal acidification triggers iron deficiency and inflammation in vivo. elife. , e . giles, n. m., watts, a. b., giles, g. i., fry, f. h., littlechild, j. a., and jacob, c. ( ) metal and redox modulation of cysteine protein function. chem. biol. , - . rigo, a., corazza, a., di paolo, m. l., rossetto, m., ugolini, r., and scarpa, m. ( ) interaction of copper with cysteine: stability of cuprous complexes and catalytic role of cupric ions in anaerobic thiol oxidation. j. inorg. biochem. , - . taylor, a. b., stoj, c. s., ziegler, l., kosman, d. j., and hart, p. j. ( ) the copper- iron connection in biology: structure of the metallo-oxidase fet p. proc. natl. acad. sci. u. s. a. , - . llinares, e., barry, a. o., and andre, b. ( ) the ap- adaptor complex mediates sorting of yeast and mammalian pq-loop-family basic amino acid transporters to the vacuolar/lysosomal membrane. sci. rep. , . jo, w. j., loguinov, a., chang, m., wintz, h., nislow, c., arkin, a. p., giaever, g. and vulpe, c. d. ( ) identification of genes involved in the toxic response of saccharomyces cerevisiae against iron and copper overload by parallel analysis of deletion mutants. toxicol. sci. , - . jo, w. j., kim, j. h., oh, e., jaramillo, d., holman, p., loguinov, a. v., arkin, a. p., nislow, c., giaever, g., and vulpe, c. d. ( ) novel insights into iron metabolism by .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper integrating deletome and transcriptome analysis in an iron deficiency model of the yeast saccharomyces cerevisiae. b.m.c. genomics. , . chun, c. d., and madhani, h.d. ( ) ctr links copper homeostasis to polysaccharide capsule formation and phagocytosis inhibition in the human fungal pathogen. crytococcus neoformans. plos. one. , e . ishizaki, h., spitzer, m., wildenhain, j., anastasaki, c., zeng, z., dolma, s., shaw, m., madsen, e., gitlin, j., marais, r., tyers, m., and patton, e. e. ( ) combined zebrafish-yeast chemical-genetic screens reveal gene-copper-nutrition interactions that modulate melanocyte pigmentation. dis. model. mech. , - . colacurcio, d. j., and nixon, r. a. ( ) disorders of lysosomal acidification-the emerging role of v-atpase in aging and neurodegenerative disease. ageing. res. rev. , - . corrionero, a., and horvitz, h. r. ( ) a c orf als/ftd ortholog acts in endolysosomal degradation and lysosomal homeostasis. curr. biol. , - . desai, v., and kaler, s. g. ( ) role of copper in human neurological disorders. am. j. clin. nutr. , s- s . kaler, s. g. ( ) inborn errors of copper metabolism. handb. clin. neurol. : - . nixon, r. a., yang, d. s., and lee, j. h. ( ) neurodegenerative lysosomal disorders: a continuum from development to late age. autophagy. , - . nguyen, m., wong, y. c., ysselstein, d., severino, a., and krainc, d. ( ) synaptic, mitochondrial, and lysosomal dysfunction in parkinson's disease. trends. neurosci. , - . stepien, k. m., roncaroli, f., turton, n., and hendriksz, c. j., ( ) roberts m, heaton ra, hargreaves i. mechanism of mitochondrial dysfunction in lysosomal storage disorders: a review. j. clin. med. , . ammann, s., schulz a., krägeloh-mann, i., dieckmann, n. m., niethammer, k., fuchs. s., eckl, k. m., plank, r., werner, r., altmüller, j., thiele, h., nürnberg, p., bank, j., strauss, a., von bernuth, h., zur, stadt, u., grieve, s., griffiths, g. m., lehmberg, k., hennies, h. c., and ehl, s. ( ) mutations in ap d associated with immunodeficiency and seizures define a new type of hermansky-pudlak syndrome. blood. , - . dell'angelica, e. c., shotelersuk, v., aguilar, r. c., gahl, w. a., and bonifacino, j. s. ( ) altered trafficking of lysosomal proteins in hermansky-pudlak syndrome due to mutations in the beta a subunit of the ap- adaptor. mol. cell. , - . . el-chemaly, s., and young, l. r. ( ) hermansky-pudlak syndrome. clin. chest. med. , - . korvatska, o., strand, n. s., berndt, j. d., strovas, t., chen, d. h., leverenz, j. b., kiianitsa, k., mata, i. f., karakoc, e., greenup, j. l., bonkowski, e., chuang, j., moon, r. t., eichler, e. e., nickerson, d. a., zabetian, c. p., kraemer, b. c., bird, t. d., and raskind, w. h. ( ) altered splicing of atp ap causes x-linked parkinsonism with spasticity (xpds). hum. mol genet. , - . lee, j. h., yu, w. h., kumar, a., lee, s., mohan, p. s., peterhoff, c. m., wolfe, d. m., martinez-vicente, m., massey, a. c., sovak, g., uchiyama, y., westaway, d., cuervo, a. m., and nixon, r. a. ( ) lysosomal proteolysis and autophagy require presenilin and are disrupted by alzheimer-related ps mutations. cell. , - . janke, c., magiera, m. m., rathfelder, n., taxis, c., reber, s., maekawa, h., moreno- borchart, a., doenges, g., schwob, e., schiebel, e., and knop, m. ( ) a versatile .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper toolbox for pcr-based tagging of yeast genes: new fluorescent proteins, more markers and promoter substitution cassettes. yeast. , - . pan, x., yuan, d. s., xiang, d., wang, x., sookhai-mahadeo, s., bader, j. s., hieter, p., spencer, f., and boeke, j. d. ( ) a robust toolkit for functional profiling of the yeast genome. mol. cell. , - . . meisinger, c., pfanner, n., and truscott, k. n. ( ) isolation of yeast mitochondria. methods. mol. biol. , - . haas, a. ( ) a quantitative assay to measure homotypic vacuole fusion in vitro. methods. cell. sci. , - . horn, d., al-ali, h., and barrientos, a. ( ) cmc p is a conserved mitochondrial twin cx c protein involved in cytochrome c oxidase biogenesis. mol. cell. biol. , – . flohe, l., and otting, f. ( ) superoxide dismutase assays. methods. enzymol. , - . spinazzi, m., casarin, a., pertegato, v., salviati, l., and angelini, c. ( ) assessment of mitochondrial respiratory chain enzymatic activities on tissues and cultured cells. nat. protoc. , - . diakov, t. t., tarsio, m., and kane, p. m. ( ) measurement of vacuolar and cytosolic ph in vivo in yeast cell suspensions. j. vis. exp. , funding and additional information: this work was supported by the national institutes of health awards r gm to vmg, r gm to cdk, and f gm to nmg. nmg was also supported by national science foundation award hrd- in the first year of this work. the content is solely the responsibility of the authors and does not necessarily represent the official views of the national institutes of health or the national science foundation. acknowledgements we thank valentina canedo pelaez for assistance with growth measurements and dr. thomas meek for allowing us to use the biotek’s synergy™ mx microplate reader. we gratefully acknowledge xuewen pan for kind gift of the variomics library from which the barcoded deletion pool used here was derived. conflict of interest: the authors declare that they have no conflicts of interest with the contents of this article. author contributions vmg conceptualized the project. vmg, nmg, and atg designed the experiments. nmg, atg, mz, performed the experiments. cdk and cq designed the bar-seq protocols and generated the yeast deletion collection. nmg, atg, and vmg analyzed the data and wrote the manuscript. vmg supervised the whole project and was responsible for the resources and primary funding acquisition. all authors commented on the manuscript. figure legends figure . schematic of genome-wide copper-sensitized screen. the yeast deletion library is a collection of ~ mutants where each mutant has a gene replaced with kanmx cassette that is flanked by a unique up tag (up) and down tag (dn) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper sequences. the deletion mutant pool was grown in fermentable (ypd) and non-fermentable medium (ypge) with and without µm cucl supplementation till cells reached an optical density of . . the genomic dna was isolated from harvested cells and was used as template to amplify up and dn tag dna barcode sequences using universal primers. pcr products were then sequenced and the resulting data analyzed. the mutants with deletion in genes required for respiratory growth is expected to grow poorly in non-fermentable medium resulting in reduced barcode reads for that particular gene(s). however, if the same gene(s) function is supported by copper supplementation then we expect increased barcode reads for that gene(s) in copper-supplemented non-fermentable growth medium. figure . yeast genes required for respiratory growth. (a) growth of each mutant in the deletion collection cultured in ypge and ypd media was measured by barseq and analyzed by t-scores. t(ypge/ypd) is plotted for top and bottom mutants. known mitochondrial respiratory genes are highlighted in red. (b-c) gene ontology analysis was used to identify the top five cellular processes (b) and cellular components (c) that were significantly enriched amongst our top scoring hits from a rank- ordered list, where ranking was done from the lowest to highest t-score. es indicates enrichment score. (d) a schematic of mitochondrial oxphos subunits and assembly factors, where genes depicted in red were “hits” in the screen with their t-scores values below - . (p- value ≤ . ). figure . genes required for copper homeostasis. (a) t(ypge + cu/ypge) score is plotted for top and bottom mutants. known copper homeostasis genes are highlighted in red. novel top hits belonging to two major cellular processes are highlighted in blue. (b-c) gene ontology analysis was used to identify the top five cellular processes (b) and cellular components (c) that were significantly enriched in our top scoring hits. es indicates enrichment score. (d) top hits are mapped along the secretory pathway. red arrows point to top hit genes. dashed arrow indicates that protein is not a subunit of the complex but is involved in the maintenance of listed protein. figure . loss of ap- results in reduced vacuolar and mitochondrial function. (a) serial dilutions of wt and the indicated mutants were seeded onto ypd and ypge plates with and without μm cucl , mgcl and zncl and grown at °c for two (ypd) or four days (ypge). coa Δ cells, which have been previously shown to be rescued by cucl , were used as a control. (b) whole cell protein lysate was analyzed by sdspage/western blot using a cox specific antibody to detect cco abundance. stain free imaging served as a loading control. coa Δ cell lysate was used as control for decreased cox levels. (c) vacuolar ph of wt and aps Δ cells was measured by using bcecf-am dye. (d) whole cell lysate and isolated vacuole fractions were analyzed by sdspage/western blot. vma was used to determine v-atpase abundance. prc and pgk served as loading controls for vacuolar and whole cell protein lysate, respectively. figure . normalization of vacuolar ph in rim Δ cells restores mitochondrial copper homeostasis. (a) vacuolar ph of wt and rim Δ cells was measured by bcecf-am dye. (b) serial dilutions of wt and the indicated mutants were seeded onto ypd and ypge plates with and without μm cucl , mgcl , or zncl and grown at °c for two (ypd) or four days (ypge). (c) cellular and (d) mitochondrial copper levels were measured by icp-ms. (e) mitochondrial proteins were .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / genetic regulators of mitochondrial copper analyzed by sds-page/western blot. cox served as a marker for cco levels, and por served as a loading control. (f) cco activity was measured spectrophotometrically and normalized to the citrate synthase activity. (g) vacuolar ph of wt and rim Δ cultured in standard (ph . ) or acidified (ph . ) ypge medium was measured by bcecf-am dye. (h) the growth of wt and rim Δ cells in ypge medium of different ph. decrease in growth at each ph was calculated by normalizing to growth at ph . . (i) cco activity of wt and rim Δ cultured in standard or acidified ypge was normalized to citrate synthase activity. (j) the growth of wt and rim Δ in ypge + µm cucl medium of different ph. decrease in growth at each ph was calculated by normalizing to growth at ph . . figure . pharmacological inhibition of v-atpase decreases mitochondrial copper content (a) vacuolar ph of wt cells grown in the presence of either dmso or , , , nm conca. (b) mitochondrial proteins in wt cells treated with dmso or nm conca were analyzed by sdspage/western blot. cox served as a marker for cco abundance, atp and por were used as loading controls. (c) cco activity in wt cells treated with dmso or nm conca is shown after normalization with citrate synthase activity. (d) mitochondria copper levels in wt cells treated with dmso or nm conca were determined by icp-ms. table : saccharomyces cerevisiae strains used in this study. yeast strains genotype source by wt mata, his , leu , met , ura greenberg, m.l. by coa Δ mata, his , leu , met , ura , coa Δ:: kanmx open biosystems by gef Δ mata, his , leu , met , ura , gef Δ:: kanmx open biosystems by aps Δ mata, his , leu , met , ura , aps Δ:: kanmx open biosystems by aps Δ - nmg mata, his , leu , met , ura , aps Δ:: kanmx this study by apm Δ mata, his , leu , met , ura , apm Δ:: kanmx open biosystems by apl Δ mata, his , leu , met , ura , apl Δ:: kanmx open biosystems by apl Δ mata, his , leu , met , ura , apl Δ:: kanmx open biosystems by rim Δ mata, his , leu , met , ura , rim Δ:: kanmx open biosystems by rim Δ - nmg mata, his , leu , met , ura , rim Δ:: kanmx this study by rim Δ mata, his , leu , met , ura , rim Δ:: kanmx open biosystems .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / antibiotic r antibiotic r gene . harvest cells and isolate genomic dna . pcr amplify tags for each mutant . statistical analysis antibiotic r antibiotic r antibiotic r . sequence dna barcodes dn dn dn dn dnup dn up up up up ypd ypge ypge + cu figure . ... gene gene gene . yeast deletion collection ~ , different mutants . grow deletion collection to early stationary phase .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / b gene set - process mitochondrial respiratory chain complex assembly cytochrome oxidase assembly mitochondrial respiratory chain complex iv assembly respiratory chain complex iv assembly p-value es cellular respiration . e- . e- . e- . e- . e- . . . . . gene set - component mitochondrial part mitochondrial inner membrane mitochondrial membrane part mitochondrial membrane p-value organelle inner membrane . e- . e- . e- . e- . e- . . . . . c figure . es - - - - cox aΔ rcf Δ coa Δ pet Δ coq Δ bottom top t -s co re a d sdh sdh sdh sdh sdh sdh sdh qcr qcr cob cyt qcr qcr qcr qcr rip qcr fmp cbs cbp cbp cbp cyt cbt bcs cox cox cox cox a cox cox cox cox cox b cox cox cox coq coq coq coq coq coq coq ndi cyc cyc cyc cii ciii civ cvcoq cyt c atp atp atp atp atp atp atp atp atp atp atp atp atp inh atp atp atp atp stf stf coq tcm sdh coq coq cbp cbp mzm aep aep nam nca atp aep atp atp atp fmc atp cbs coq rcf rcf suv mrs cox nam pet mss pet pet mne ccm pet pet oxa cox cox mss pnt imp imp som cox sco cox cox cox pet cmc coa cox cmc cox arh cox pet pet cox shy coa coa coa coa yah mss atp nca mam (y p g e -y p d ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . - - - ccc Δ apl Δ atx Δ apl Δ aps Δ coa Δ gsh Δ apm Δ ctr Δ top bottom gene set - process golgi to vacuole transport post-golgi vesicle-mediated transport protein targeting to vacuole transition metal ion homeostasis p-value es chemical homeostasis . e- . e- . e- . e- . e- . . . . . gene set - component ap- adaptor complex ap-type membrane coat adaptor complex golgi apparatus cytoplastmic vesicle p-value es intracellular vesicle . e- . e- . e- . e- . e- . . . . . golgi vacuole endosome multivesicular body apm aps apl apl ap- complex ap- complex apm aps apl rim rim sys arf vam h+ h+ a b c d gef Δ rav rim pathway gsh Δ kha Δ t -s co re (y p g e c u- y p g e ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / w t ap s Δ vma prc pgk isolated vacuoles - - + + whole cell lysate + + - - w t ap s Δ w t co a Δ ap l Δ ap m ap l Δ ap s Δ cox stain free . . . v ac uo le ph w t ap s Δ a b d . Δ c wt coa Δ aps Δ apl Δ ypd + µm cucl + µm zncl ypge + µm mgcl no addition apl Δ figure . ∗∗ .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ∗∗ ∗ c co a ct iv ity (n or m al iz ed to c itr at e sy nt ha se ) . . . . ∗ ns m ito ch on dr ia l c op pe r (n g/ m g of m ito ch on dr ia ) . . . to ta l c el lu la r co pp er (n g/ m g of c el l p el le t) wt rim Δ wt +c u . . . ns ns ypd + µm cucl + µm zncl ypge + µm mgcl wt coa Δ rim Δ rim Δ no addition μg protein cox por rim Δ +c u wt rim Δ wt +c u rim Δ +c u wt rim Δ rim Δ +c u wt rim Δ rim Δ +c u b c d lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent luptatum zzril delenit augue duis dolore te feugait nulla facilisi. lorem ipsum dolor sit amet, cons ectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat e f g figure . . . . . w t . v ac uo la r ph . . v ac uo la r ph . . . c co a ct iv ity (n or m al iz ed to c itr at e sy nt ha se ) wt rim Δ ∗∗ ∗∗ ∗∗∗∗ media ph . . . . wt rim Δ a h i ∗ media ph . . . . ri m Δ ∗∗ - . - . . wt rim Δ ypge ph d ec re as e in g ro w th ( n or m al iz ed to g ro w th a t p h . ) j - . - . . wt+cu rim Δ+cu . ypge+cu ph d ec re as e in g ro w th ( n or m al iz ed to g ro w th a t p h . ) ∗ ∗ ∗ ∗ ∗ ∗ ns .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / c co a ct iv ity (n or m al iz ed to ci tr at e sy nt ha se ) . . . w t w t+ c on ca cox por . . . v ac uo le ph wt . . . . atp ∗∗∗∗∗∗ ∗∗∗ conca w t + c on ca m ito ch on dr ia l c op pe r (n g/ m g of m ito ch on dr ia ) w t + c on caw t w t a b c d figure . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / microsoft word - urease_inhibitor_actanew high-throughput tandem-microwell assay for ammonia repositions fda-approved drugs to helicobacter pylori infection fan liu,a,b,# jing yu,b,# yan-xia zhang,c fangzheng li,a, d qi liu,e yueyang zhou,a shengshuo huang,b houqin fang,f zhuping xiao,e lujian liao,f jinyi xu,d xin-yan wu,c fang wu a,* akey laboratory of systems biomedicine (ministry of education), shanghai center for systems biomedicine, shanghai jiao tong university, shanghai, , china bstate key laboratory of microbial metabolism, sheng yushou center of cell biology and immunology, school of life science and biotechnology, shanghai jiao tong university, shanghai, , china cschool of chemistry & molecular engineering, east china university of science and technology, shanghai, , china. dstate key laboratory of natural medicines and department of medicinal chemistry, china pharmaceutical university, nanjing, , china ehunan engineering laboratory for analyse and drugs development of ethnomedicine in wuling mountains, jishou university, hunan, , china fshanghai key laboratory of regulatory biology, school of life sciences, east china normal university, shanghai, , china. #these authors contributed equally to this work. *to whom correspondence may be addressed. email: fang.wu@sjtu.edu.cn running title: repositioning of old drugs to treat h. pylori infection .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract to date, little attempt has been made to develop new treatments for helicobacter pylori (h. pylori), although the community is aware of the shortage of treatments for h. pylori. in this study, we developed a -tandem-microwell-based high-throughput-assay for ammonia that is a known virulence factor of h. pylori and a product of urease. we could identify few drugs, i.e. panobinostat, dacinostat, ebselen, captan and disulfiram, to potently inhibit the activity of ureases from bacterial or plant species. these inhibitors suppress the activity of urease via substrate-competitive or covalent-allosteric mechanism, but all except captan prevent the antibiotic-resistant h. pylori strain from infecting human gastric cells, with a more pronounced effect than acetohydroxamic acid, a well-known urease inhibitor and clinically used drug for the treatment of bacterial infection. this study offers several bases for the development of new treatments for urease-containing pathogens and to study the mechanism responsible for the regulation of urease activity. key words: ammonia, high-throughput screening, antibiotic resistance, enzyme inhibitor, urease, mechanism of action, helicobacter pylori .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction bacteria, fungi and plants, with the exception of animals, contain urease( ). urease (ec . . . ) is a class of nickel metalloenzyme that hydrolyzes amino acid metabolites to produce ammonia (nh ) and carbon dioxide( , ). the active catalytic site of urease consists of two nickel ions, a carbamylated lysine residue, two histidines and an aspartic acid. in addition to the consistent catalytic mechanism, the amino acid sequence of urease has been reported to be highly conserved between different species( ). bacterial urease is known to be a key virulence factor of some pathogens for a number of diseases( ), e.g., helicobacter pylori (h. pylori) for gastritis or gastric cancer, and proteus mirabilis (p. mirabilis) for urinary tract infections and urinary stones( ) . the pathogens can hydrolyze urea substrates to produce nh . the released nh not only helps h. pylori to survive in the low ph environment of the stomach but also causes damage to the gastric mucosa, triggering the infection( ). additionally, nh generated by p. mirabilis urease has been demonstrated to form urinary stones and destroy the urinary epithelium in the urinary system( ). because the human body does not contain urease, bacterial urease has been thought to be an important and specific drug target for combating these pathogens( ). a number of studies have been performed to identify inhibitors of urease( - ), but only one urease inhibitor, acetohydroxamic acid (aha), was approved for the treatment of urinary infections and urinary stones in by the us food and drug administration (fda)( , ). severe side effects, low stability in gastric juice, and a lack of direct .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / evidence for suppressing the growth of pathogens seem to be the limiting factors for the low success rate of these urease inhibitors. adverse side effects of aha, including teratogenic effects( ), a low efficiency indicated by the required high dose for the patient (~ mg/day for adults), and the assumed drug resistance of bacteria, further imply that potent and bioactive inhibitors with new chemical moieties are urgently needed to combat these pathogens. indeed, the current clinical first-line regimen for the treatment of h. pylori [proton-pump inhibitor, clarithromycin, amoxicillin or metronidazole (sometimes tinidazole)]( , ), is unable to completely eradicate h. pylori due to the increased antibiotic resistance( , ). to date, few validated high-throughput assay has been constructed to quantitatively analyze nh and the activity of nh -generating enzyme urease, but no high-throughput screening approach has been employed to systematically extend the chemical moiety of urease inhibitors. the current assay to determine the activity of urease mainly relies on colorimetric reactions to determine the concentration of nh using indophenol or nessler’s reaction( ). recently, a microfluidic chip-based fluorometric assay has been developed to monitor the activity of urease( , ). in addition, a cell-based assay for h. pylori urease has been reported lately, and validated by known inhibitors of urease, but it has not been employed to screen new inhibitors for urease yet( ). overall, the current assay setting and procedures are relatively time-consuming and vulnerable to interference. in this study, we established and validated a new tandem-well-based hts assay for nh .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and nh -generating urease and performed an hts screening campaign to identify druggable chemical entities from , fda or foreign approved drugs (fad) -approved drugs for jack bean and bacterial ureases. five clinically used drugs, i.e., panobinostat, dacinostat, ebselen (ebs), captan and disulfiram, were found to be submicromolar inhibitors of h. pylori urease (hpu), jack bean urease (jbu), or urease from ochrobactrum anthropi (o. anthropi), a newly identified pathogen with resistance to -lactam antibiotics( ). moreover, panobinostat, dacinostat, ebs and disulfiram potently inhibited the infection of h. pylori, suggesting that these pharmacologically active moieties or drugs could serve as bases for the development of new treatments for urease-positive pathogens. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / results development of a high-throughput assay and identification of potent inhibitors for urease to construct a high-throughput assay for nh -generating urease and prevent the detection interference from substances in the enzyme extraction, we utilized a -tandem-well-based gas-detection method, which we previously developed to monitor the activity of h s-generating enzymes( , ). the tandem-well design could physically separate the gas product from the enzymatic reaction and enable the specific and real-time detection of the gas-producing enzyme activity (figure a). to construct the hts assay, we compared three reported protocols for determination of the activity of jbu by using salicylic acid-hypochlorite and nessler detection reagent, as well as phenol red( , , ), which undergo the indophenol and nessler’s reaction with nh , respectively. salicylic acid-hypochlorite and nessler’s reagents could dose-dependently and time-dependently monitor the activity of jbu at various concentrations (figures s a and b); however, the phenol red failed to detect it (figure s c). we decided to choose salicylic acid-hypochlorite as the detection reagent for the hts screening assay of jbu (figure s a) due to its lower toxicity than nessler reagent, which contains mercury( ). the absorbance (od) at nm of the blue complex indophenol generated from salicylic acid was correlated linearly with the concentration of nh cl ( . - m), thus validating the analytic setup for nh quantification (figure s d). moreover, the optimal assay buffer for jbu was found to be phosphate buffer at .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ph . (figure s e). in contrast, we employed nessler’s reagent to detect the activity of hpu and ochrobactrum anthropic urease (oau) in subsequent studies since it showed a better sensitivity for the limitation of detection of the activity of hpu than salicylic acid-hypochlorite (figures s f and g). collectively, we chose nm of jbu and mm urea substrate in the phosphate buffer to perform the assay. under the assay conditions, aha showed an ic of ~ μm (figure b), which was very similar to the previously reported value (ic of ~ μm; ref. ( )), indicating that the newly developed assay for urease was accurate and reliable. however, the ic of aha was found to decrease to . μm when using the mm tris buffer instead of the phosphate buffer in our assay (table ). to determine the well-to-well reproducibility, the assay was validated with m aha (~ ic ) or m (~ -fold ic ) aha. the tandem-well plate consistently showed distinct differences among the control, the m-aha-treated and the m-aha-treated groups (figure c). the average z’ values of the assay were found to be ~ . when they were calculated with the m aha positive control. to identify novel and potent inhibitors for urease, we screened , fda or fad -approved drugs at μm. five potent hits, i.e., panobinostat, dacinostat, ebs, captan and disulfiram, were found to dose-dependently inhibit the activity of jbu with ic values of . , . , . , . and . m, which are ~ , , , , , -fold more potent than aha, respectively (figure e and table ). intriguingly, the former two drugs are analogs of aha. importantly, all of them seemed to bear significant .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / selectivities for urease since they did not substantially inhibit other gas-producing enzymes, i.e., cystathionine beta-synthase (cbs) and cystathionine -lyase (cse), two h s-generating enzymes (figure f). moreover, the potent inhibitory effects of these inhibitors were likely due to on-target inhibition of jbu rather than the nonspecific reaction with nh or forming an aggregation since they did not react with nh and their inhibition was not attenuated by the detergent (figures s a and s b). in corroborating these findings, ebs and disulfiram have recently been reported to be specific inhibitors of bacterial and plant urease( , ), respectively, although their mode of actions for inhibiting urease, and their effects on the proliferation or infection of urease-containing pathogens remain little explored. the mode of action study for urease inhibitors to determine the reversibility of the inhibition by panobinostat, dacinostat, ebs, captan and disulfiram to jbu, various concentrations of the inhibitors and jbu were incubated together for min (figure a). after a -fold dilution, the inhibitory effects of panobinostat and dacinostat as well as disulfiram were found to be reversible (figures a and s c). in contrast, ebs or captan at nm was found to completely block the activity of jbu; this concentration did not affect the activity without the pre-incubation with enzyme (figure e). additionally, the inhibitions exerted by ebs or captan were not fully recovered (figure a), indicating that both of them were likely to be covalent or slow-dissociation inhibitors for jbu. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / surprisingly, the inhibitory effect of disulfiram was found to be dependent on the concentrations of ni + ion, the catalytic cofactor for urease (figure s c), indicating that it inhibits jbu likely via formation of a complex with the catalytic ni + ion and subsequently occupying the active site of jbu. this explanation seems to be plausible since recent findings have revealed that disulfiram inhibits the proliferation of tumor cells by forming a complex with cu +( ). moreover, the inhibitory potencies of panobinostat and dacinostat were found to increase with the pre-incubation time of the compound with urease (figure b). after h pre-incubation, the ic value of panobinostat and dacinostat were decreased ~ . folds and ~ . folds, respectively (figure b). in enzyme kinetics studies for jbu, panobinostat and dacinostat were found to be competitive inhibitors towards urea substrate, with a ki value of . and . m (figure c and table ), which are ~ folds and folds more potent than aha (ki ~ . m; table ). in consistent with this observation, the inhibition of these two inhibitors doesn’t be interfered with ni + (figure s a). also, the addition of histidine or cysteine has no effects on the inhibition of panobinostat or dacinostat (figure s b). importantly, the surface plasmon resonance assay demonstrate that these two compounds could physically bind to jbu (figure d; table ). the drastic effect seems not only relying on the hydroxamic acid motif that is the known pharmacophore of aha-derivative inhibitors, but also the hydrophobic ring and secondary amine group, as indicated by that the benzene ring favorably interacts with the his residue and/or the nitrogen atom forms an additional hydrogen bond with .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / asp in the modeled inhibitor-jbu complex structure (figure e). in contrast, the inhibition caused by both ebs and captan was found to be prevented by the addition of dithiothreitol (dtt) or free cysteine into the enzymatic reaction, but not that of histidine or ni + (figures s a-c). furthermore, the ic values of the two inhibitors were linear with the concentrations of the enzyme (figure s d), an inhibitory feature of the covalent inhibitor( ), confirming that they targeted the enzyme covalently. the inhibition constants for these irreversible inhibitors, i.e., the rate of enzyme inactivation (kinact) and inactivation rate constants (ki), were also determined by nonlinear regression of the time-dependent ic values (figure s e)( ). the kinact and ki for ebs were found to be . × - s- and . m, which were . and . -fold better than captan (kinact, . × - s- ; ki, . m), respectively. taken together, the results demonstrated that ebs and captan inhibited jbu by covalently modifying the cys rather than his residue, the latter of which is known to be the active site of urease ( , ). interestingly, we observed a synergistic inhibitory effect from the combination of ebs and aha (figure a), a substrate-competitive inhibitor for urease, implying that ebs targeted cys residue(s) of another site rather than the active site. similar experimental results were also obtained for captan. moreover, the combination of ebs with m captan also significantly increased the potency of ebs by -fold (right panel, figure a), implying distinct binding sites of the two covalent inhibitors. to corroborate this finding, we performed enzyme kinetics, mass spectrometry and surface plasmon resonance studies (figures b-d). consistently, ebs or captan displayed .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / a noncompetitive mode for the urea substrate (figure b). furthermore, tandem-mass spectrometry analysis revealed that cys and cys , which were not adjacent to the active site, appeared to be modified by ebs and captan, respectively (figure c). the addition of . daltons in molecular weight was observed for ebs, demonstrating the breakage of the se-n bond and formation of the se-s bond with the cys residue, a phenomenon that has been reported previously for ebs( ). however, the increase of . daltons suggested that only the isoindole dione moiety of captan modified the cys residue, accompanied by the release of the trichloromethyl thio moiety [-sc(c ) ]. this new observation provides a new perspective for the unexplored covalent molecular mechanism of captan. additionally, a potent and physical interaction between ebs or captan and jbu was observed in the surface plasmon resonance study (figure d). the equilibrium dissociation constant (kd) for ebs and captan was found to be and nm, respectively. to illustrate the binding mode of ebs or captan, we modeled them into the respective allosteric cys-containing pocket (cys for ebs, cys for captan) in jbu by using molecular dynamics simulations (figure e). the carbonyl group of ebs was found to form a hydrogen bond with lys , and the phenyl ring interacts with the hydrophobic side chain of leu . additionally, the two carbonyl groups of captan formed four hydrogen bonds with the side chains of asn , his , tyr and asn . taken together, these results implied that these intermolecular weak interactions also .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / substantially contributed to the binding of the covalent inhibitors to the protein, in addition to the covalent interaction. the inhibitory effect of inhibitors on bacterial ureases next, we investigated the effects of panobinostat, dacinostat, ebs and captan as well as disulfiram on the activity of hpu and oau, two bacterial ureases from h. pylori and o. anthropic, respectively. as expected, these drugs could inhibit the activity of hpu in the crude extracts and showed ic values of . m, . m, . m, . and . m, which indicated that they were ~ , , , and -fold more potent than aha (ic ~ . m; figure a and table ), respectively. moreover, panobinostat, dacinostat, ebs, captan and disulfiram were also found to inhibit the partially purified hpu, which was isolated by size-exclusion chromatography (figures b and s ). consistently, they also suppressed the activity of oau at a similar potency to hpu (figure a and table ). compounds , and , which were synthesized in house (scheme s ), as well as commercially available ebs oxide, also showed a better efficiency than ebs (ic ~ . m) in the in vitro hpu-based enzyme assay (table s ), and displayed a maximum three-fold increase in potency (ic ~ . m; table s ). moreover, we could confirm that panobinostat, dacinostat and ebs as well as ebs oxide, , or , could largely suppress the activity of hpu in culture (figure c). the ic values of these inhibitors for inhibiting the urease of the cultured h. pylori strain ranged from . to . m (figure c and table s ). further, we investigated the effects of panobinostat, dacinostat and ebs, which are the .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / most potent inhibitors for hpu (figure a). the results showed that ebs, but not panobinostat, dacinostat or its analog aha, has a substantial suppression on the growth of h. pylori (figure d). the inability of aha as well as its derivatives, i.e. panobinostat and dacinostat, on the growth of h. pylori as identified above seems to be consistent with the previous finding that aha doesn’t inhibit the growth of h. pylori( ). interestingly, ebs and ebs analogs, as well as disulfiram, could dose-dependently suppress the growth of h. pylori and showed a minimum inhibitory concentration (mic) in a range between and g/ml (right panel of figure d, figure s a and table s ). importantly, the inhibitory effect of this type of covalent inhibitors lasted for a long period in culture, as indicated by ebs and , which could substantially inhibit hpu even after removal of the inhibitor for h (figure s b). urease inhibitors prevent h. pylori infection in a gastric cell-based bacterial infection model to evaluate the ability of these urease inhibitors to prevent h. pylori infection, we constructed a gastric cell-based bacterial infection model using the remaining viable cell number of sgc- adenocarcinoma gastric cells to reflect the virulence of h. pylori( ). our results showed that treatment with m panobinostat, m dacinostat, m ebs or m disulfiram could prevent the cell death triggered by h. pylori (figures a-b). in sharp contrast, the cells that lacked such treatments were largely sabotaged. panobinostat and ebs were found to be the most potent agents and almost completely protected from the infection of h. pylori. these effects of these drugs seemed .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / to be much more efficient than the effects of m aha or m tinidazole, the analog of metronidazole, and one of the two antibiotics in the triple regimens for the treatment of h. pylori ( , ). in support of this observation, tinidazole as well as metronidazole hardly suppressed the growth of our h. pylori strain, with an mic value of more than g/ml in culture (figure s a and table s ), indicating that this strain is resistant to treatment with nitroimidazole-type antibiotics. since panobinostat, dacinostat, ebs and disulfiram at a concentration up to m or m did not interfere with the proliferation of sgc- gastric cells (figure s b), the protective effects in the gastric-cell-based h. pylori infection model seemed to be attributed to on-targeting inhibition of the infection transmitted by h. pylori. moreover, all four drugs potentially inhibited the level of ammonia in the cell medium (figure c), indicating that they efficiently suppressed the endogenous urease activity of h. pylori in the infection model. the structural basis and inhibitory mechanisms of newly-identified three classes urease inhibitors to identify the active chemical moiety of panobinostat, dacinostat, ebs or captan required for inhibition of urease, we analyzed their structure-activity relationships (figures , table s and s ). the former two inhibitors are hydroxamic acid-based urease inhibitors, and not only their hydroxyamino heads are forming hydrogen bonds with the catalytic ni + and residues in jbu or hpu (asp or ala for jbu; asp .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / or ala for hpu), but also the acetyl group constitutes one hydrogen bond (his for jbu and his for hpu; figures e and s a). consistent with this observation, the hydroxyamino and acetyl groups of aha interact with asp or ala and his in a co-crystal structure of aha and hpu( ), respectively (figure s a). compound lacking of this acetyl group, i.e. hydroxylamine, totally abolished the inhibitory effect of this type inhibitor (figure b and table s ). apart from these interactions, the hydrophobic benzene ring and secondary amine group of panobinostat were found to be additional pharmacophores (upper panel, figure e), which interact favorably with his (jbu) or his (hpu) and form an extra hydrogen bond with asp (jbu) or asp (hpu). in supporting this finding, the hydroxamic acid analogs that are lack of the benzene ring, i.e. ricolinostat, ilomastat and pracinostat, are inactive to jbu and hpu (figure b and table s ). strikingly, the replacement of benzene with benzimidazole (pracinostat) totally loses the inhibition, suggesting the benzene is critical for maintaining the inhibition. moreover, the secondary amine group seems to be also important for enhancing the potency of this type inhibitor, since the modification or replacement of it with hydroxyl group or sulfonyl group (dacinostat or belinostat), also weaken ~ -fold or -fold in ic values. for ebs analogs, compounds ( - ) lacking the se atom largely lost inhibitory activities toward jbu and hpu (figure b and table s ). furthermore, dibenzyl diselenide was also inactive toward both ureases, indicating that the se-containing benzisoxazole moiety rather than the solo se atom might be essential for the inhibition. indeed, se-containing .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / benzisoxazole ( ) showed potent inhibition of hpu (ic ~ . and . m for jbu and hpu, respectively). the introduction of an electron-donating group to the benzisoxazole moiety apparently strongly reduced the potency ( ; ic ~ . m for jbu and more than m for hpu; figure b). in contrast, the provision of electron-withdrawing groups to the nitrogen or se atom of the benzisoxazole moiety, i.e., or ebs oxide, seemed to enhance the potency of jbu by a maximum of three-fold ( ). similarly, when weakening the electron-withdrawing effect in the substitution group of the isoindole dione core of captan, the active moiety (figure c), was also found to lead to a decreased potency (figures ; table s ). taken together, these data indicate that the se-containing benzisoxazole or the isoindole dione moiety played crucial roles in the potency of these kinds of inhibitors, the se or n atom of which was subjected to nucleophilic attack by the thiol group of cys and formed the se-s or n-s bond. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / discussion in the present study, we could identify that four clinical-used drugs, i.e., panobinostat, dacinostat, ebs and disulfiram, two anti-cancer drugs, an anti -stroke or -bipolar drugs, and an alcohol-deterrent drug, respectively, could protect the gastric cells from the infection at submicromolar concentrations (table and figure ). the efficacy of these drugs substantially exceeded that of aha, a well-known urease inhibitor and clinically used drug for bacterial infections. they seemed also to be more effective than tinidazole, a metronidazole type antibiotic in the classic triple recipe for h. pylori (figures ). moreover, panobinostat, ebs and disulfiram have been administered to humans and do not incur severe side effects( , , ). additionally, these drugs did not affect the viability of mammalian cells at a concentration up to m or m (figure s b), suggesting that they had a rather safe profile in cells and in vivo. taken together, our study armed with the newly-developed hts assay for urease repositions four clinically used drugs as new advanced leads for the treatment of h. pylori infection. the mode of action of panobinostat, dacinostat, ebs or disulfiram was found to inhibit h. pylori urease and reduce the production of nh in culture (table ; figures s a, b, c and c), which are well-known bacterial virulence factors( ). panobinostat and dacinostat are reversible hydroxamic acid-type inhibitors for urease, and displayed more than or -fold potencies than its analog aha (table ). these largely improved inhibitors indeed enhanced the protective effects to the infection of h. pylori in the cell-based infection model (figures a and c), demonstrating that pharmacologically .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / targeting urease could offer an effective treatment for h. pylori and hpu is a validated pharmacological drug target. however, suppression of the urease activity with these potent inhibitors of hpu, could not retard the growth of h. pylori in culture, indicating that urease is not crucial for bacterial growths. moreover, ebs was found to irreversibly inhibit urease by covalently modifying an allosteric cys residue outside of the active site (figures a and ). the newly identified covalently allosteric regulation of the activity and stability of urease by ebs and captan may explain why these inhibitors could potently and persistently inhibit urease activity and the growth of h. pylori even in the presence of high concentrations of urea substrate (figure s b), two merits that are observed for covalent allosteric drugs( ). indeed, when compared with the reversible inhibitor aha, ebs displayed an ~ and -fold improved potency for jbu and hpu, respectively, and a long-acting inhibitory effect on the endogenous activity of urease and the growth and infection of h. pylori in culture (figures c-d, b-c and s b). importantly, the anti-h. pylori mic value of ebs and its analogs, i.e. ebs oxide, , , , seems to be much effective or at least comparable to metronidazole or clarithromycin, which are the two antibiotics in the classic triple recipe for h. pylori (table s )( ), indicating these newly-validated chemical moieties for inhibiting the growth of h. pylori are promising antibiotics for developing new treatments for urease-containing pathogens. since the urease activity is dispensable for the growth of h. pylori (see our discussions with the mode of action of panobinostat and dacinostat), this finding indicates the effect of ebs-type inhibitor on the growth of h. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / pylori is beyond the solo inhibition of urease activity. in summary, we identified five clinical drugs as submicromolar inhibitors for plant or bacterial urease by performing the first hts campaign of urease. these clinically used drugs panobinostat, dacinostat, ebs and disulfiram inhibit the virulence of h. pylori in a gastric-cell-based infection model. this study provides a new hts assay, drug leads and a regulatory mechanism to develop bioactive urease inhibitors for the treatment of h. pylori infection, especially antibiotic-resistant strains. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / experimental procedures materials jack bean urease (jbu), dmso, and dithiothreitol (dtt) were purchased from sigma (steinheim, germany). hypochlorous acid, sodium nitroprusside, salicylate, potassium sodium tartrate, urea, sodium hydroxide, bovine serum albumin, triton x- , l-histidine and l-cysteine were purchased from sangon (shanghai, china). nessler's reagent was purchased from jiumu company (tianjin, china). acetohydroxamic acid was purchased from medchemexpress (monmouth junction, nj). columbia blood agar plate, liquid medium powder for h. pylori, bacteriostatic agent and polymyxin b were purchased from hopebio company (shandong, china). rmpi medium and fetal bovine serum (fbs) were purchased from gibco (invitrogen, gaithersburg, md). the other materials were purchased from the indicated commercial sources or were from sigma. construction of the high-throughput screening assay for urease the assay was constructed to measure the activity of urease based on a -tandem microwell plate, which we had previously developed to detect the h s gas generated by h s-generating enzymes( , ). phosphate or tris buffer at various ph values were used to determine the optimal ph for jbu in the presence of mm urea substrate (figure s e). the optimal conditions were found to be the mm phosphate buffer and ph . . moreover, the suitable detection reagent and enzyme concentrations were resolved by testing three types of nh detection reagents with various concentrations of jbu or hpu, .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / i.e., salicylic acid-hypochlorite, nessler’s reagent and phenol red detection reagent (figures s a-c). the optimized conditions for the standard assay were found to be with salicylic acid-hypochlorite and commercial nessler’s detection reagents (jiumu, tianjin, china) for jbu and hpu, respectively, in the presence of nm jbu or - nm hpu, mm urea, m nicl , and mm phosphate buffer (final concentrations of ph . ). the salicylic acid-hypochlorite detection reagent contained . mm hypochlorite, mm sodium hydroxide, mm salicylic acid, mm potassium sodium tartrate and . mm sodium nitroprusside. the assay was performed using multichannel pipettes to add μl of each compound (solubilized in dmso or h o) and μl of the enzyme mix ( nm, m tris, ph . ) into the reaction well (figure a), followed by a -min incubation. after addition of l of salicylic acid-hypochlorite or nessler’s detection reagent to the detection well, l substrate solution ( mm urea, m nicl , . % bovine serum albumin (w/v)) was mixed with the enzyme in the reaction well. the reaction was monitored at °c, and the absorbance at nm or nm was accordingly measured at the appropriate time points in a microplate reader (synergy from biotek, winooski, vt). primary screening of urease inhibitors using a high-throughput assay we screened , compounds of fda or fad-approved drugs from johns hopkins clinical compound library (jhccl, baltimore, md) or from topscience biotech co. ltd. (shanghai, china) at μm for the inhibition of jbu under standard assay conditions with salicylic acid-hypochlorite detection reagent as described above. the z’ .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / value of the screening assay was calculated from negative samples ( % dmso) and positive samples ( m aha) and found to be more than . ( ), indicating the assay is an excellent assay. routinely, negative samples and positive samples were used to determine the assay performance, and screening data with a minimum z’ value of . were accepted. compounds that show more than % inhibition were selected for the further validation. primary hits were defined as that compound is free of heavy metal atom and shows a more than % inhibition at m. compounds used for follow-up studies all hits identified from the primary screening and their analogs were reordered in the highest pure powder from commercial sources or synthesized in-house for the following studies: dose-dependent, kinetic studies, biophysical assays, lc-ms/ms analysis, cell or bacteria-based studies. panobinostat and dacinostat were brought from adooq (catalog number: a for panobinostat, a for dacinostat). ebs and captan were purchased from sigma (catalog number: e for ebs, for captan). disulfiram (tetraethylthiuram disulfide) was purchased from tci chemicals (b ). captafol ( st ) was purchased from alta scientific co.,ltd (tianjing, china), and dibenzyl diselenide (catalog number: b ) was purchased from alfa aesar (ward hill, ma). abexinostat (catalog number: hy- ), belinostat (hy- ), vorinostat (hy- ), ricolinostat (hy- ), ilomastat (hy- ) and pracinostat (hy- ) were brought from medchemexpress. the purities of these commercially available .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / primary leads or analogs of leads as well as in-house synthesized ebs derivatives were confirmed to be at least % by using hplc (for details, see below), with an exception for ebs, the purity of which is determined with combustion analysis methods by the supplier. all the hplc spectra as well as the combustion analysis data for these inhibitors, which were determined either from commercial supplier or by ourself, were included in the supporting information (see below). determination of ic values the ic values of all the hits or their analogs, as well as aha, on the activity of jbu, hpu or oau were determined according to the above-described standard assay conditions. compounds were incubated with the enzyme and assayed at a series of concentrations (at least steps of doubling dilution). similarly, the ic values of these inhibitors for hcbs or hcse were determined accordingly( ). sigmoidal curves were fitted using the standard protocol provided in graphpad prism (graphpad software, san diego ca). ic was calculated by semilogarithmic graphing of the dose-response curves. aggregation-based assay to exclude the mechanism by which inhibitors suppress the activity of urease via colloidal aggregation, we performed an aggregation-based assay in the presence of nonionic detergents( ). freshly prepared triton x- (sangon, shanghai, china) at different concentrations of . %, . %, . %, . %, and . % was first tested for its effects on the activity of jbu under standard assay conditions. subsequently, the .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / inhibitory effects of panobinostat, dacinostat, ebs, captan and disulfiram, as well as the analogs of ebs in the in vitro jbu activity assay, were determined in the presence of . % triton x- , a concentration that alone has no inhibitory effect on the activity of jbu. reversibility assay to illustrate the mode of action for the inhibitors of urease, we performed the rapid-dilution experiment. after incubation with panobinostat at a concentration of m, dacinostat at m, ebs or captan at , , or μm for min, jbu ( m) was diluted -fold in the assay buffer. after a further incubation of , , . , , , or h, the remaining activity of jbu was accordingly measured (methods). the inhibitor concentrations after dilution are indicated in the figure. determination of kinact or ki parameters for irreversible inhibitors the ic values of ebs or captan for jbu were measured after different preincubation periods with the enzyme, i.e., , , , , , , , or min. the kinact and ki values for ebs or captan were obtained by nonlinear regression plotting of the time-dependent ic data as previously reported( ). enzyme kinetics the reaction rate was determined with jbu at the indicated concentrations of panobinosta, dacinostat, ebs or captan against increasing concentrations of urea substrate ( . , . , . , , , , mm for panobinosta and dacinostat; . , , , , mm for ebs and captan). the data were fitting to the michaelis-menten inhibition .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / equation for determination of the competitive and noncompetitive inhibition parameter ki and ki using graphpad prism (table , figures c and b)( ), respectively. to illustrate the inhibition type, lineweaver-burk plots of these inhibitors were drawn and analyzed. lc-ms/ms analysis jbu at a concentration of . m was incubated with dmso, m ebs or m captan for min at room temperature. then, three aliquots of g samples from the inhibitor-treated jbu or purified hpu (fraction in figure b) were digested separately with three proteases, including . l trypsin ( gl, . l gluc ( glor . l subtilisin ( glovernight. the proteolytic peptides were combined and desalted on c spin columns and dissolved in buffer a ( . % formic acid in water) for lc-ms/ms analysis. the peptides were separated on a -cm c reverse-phase column ( μm × μm) at a flow rate of nl/min, with a -min linear gradient of buffer b ( . % formic acid in acetonitrile) from % to %. the ms/ms analysis was performed on the q-exactive orbitrap mass spectrometer (thermo fisher scientific, san jose, ca) using standard data acquisition parameters as described previously( ). the mass spectral raw files were searched against the protein database derived from the standard sequence of jbu, hpu or the proteome of h. pylori using proteome discovery . software (thermo fisher scientific, san jose, ca), with a differential modification of . m/z in the case of ebs and . m/z in the case of captan. surface plasmon resonance assays .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the direct interactions between panobinostat, dacinostat, ebselen or captan and jbu were observed by the surface plasmon resonance (spr) experiment with a biacore t (ge healthcare, uppsala, sweden). jbu was immobilized on the surface of the cm sensor chip via the amino-coupling kit. the working solution used for the spr assay was pbs-p ( mm na hpo , . mm kh po , . mm kcl, and mm nacl in presence of % dmso, ph . ). to determine the affinity of the inhibitors toward jbu, panobinostat, dacinostat, ebs or captan were diluted to specific concentrations with pbs-p buffer (for panobinostat: , . . , . , . m; dacinostat: , , , . . , . , . m; ebs: , , , , . , or . nm; for captan: . , . , . , . , . , . or . nm) and subjected to the jbu-coated chips. the kd values were calculated with biacore evaluation software (version . ). molecular modeling the crystal structures of ureases were obtained from the protein data bank (pdb code: goa for jbu; pdb code: e y, hpu). the binding modes of panobinostat or dacinostat were gathered by using the cdocker module of the discovery studio software (version . ; accelrys, san diego, ca). alternatively, autodock vina was initially used to dock the ebs or captan to the respective cys-containing allosteric site of jbu to obtain the appropriate configurations, enabling the reactive motifs of the compounds (the se-containing benzisoxazole of ebs and the isoindole dione moiety of captan) to fall into the distance restraint of one covalent bond to the sulfur atom of the reactive cys residue. the se-s bond or the n-s bond for isoindole dione was then .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / manually incorporated using the discovery studio . software (accelrys, san diego, ca). subsequently, molecular dynamics simulation was performed with amber software and the ff .r force field( ). to relieve any steric clash in the solvated system, initial minimization with the frozen macromolecule was performed using -step steepest descent minimization and , -step conjugate gradient minimization. next, the whole system was followed by , -step steepest descent minimization and , -step conjugate gradient minimization. after these minimizations, -ps heating and -ps equilibration periods were performed in the nvt ensemble at k. finally, the -ns production runs were simulated in the npt ensemble at k. the binding modes for these inhibitors were visually inspected and the best docking mode was selected. bacterial strains and culture conditions bacterial strains of h. pylori or o. anthropic were obtained from beinuo life science (shanghai, china). the strains were maintained on columbia blood agar plates (hopebio, shandong, china) containing % defibrinated sheep blood at °c under microaerobic conditions ( % o , % co and % n ), which was supplied by an anaeropack-microaero gas generator (mitsubishi gas chemical company, japan). after a culture of - days in the plate, the bacterial colonies were scratched into the liquid medium for h. pylori, containing % or % fetal bovine serum and an antibacterial cocktail (composed of mg/l nalidixic acid, mg/l vancomycin, mg/l amphotericin b, mg/l trimethoprim and . mg/l polymyxin b sulfate; beinuo, shanghai, china), and microaerobically incubated for another or days. then, the medium or bacterial cells .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / were collected for subsequent experiments. a single colony of o. anthropic was inoculated into luria-bertani liquid medium (lb), which was supplemented with mg/l ampicillin, mg/l kanamycin and % fbs (invitrogen) and cultured at °c. after the bacterial culture reached an o.d. of . at nm, the bacterial cells were collected by centrifugation for future experiments. the identification of h. pylori and o. anthropic strain was carried out by pcr amplification of the urease gene or s rrna with known primers (table s ), lc-ms/ms analysis of proteins in the extracts, the bacterial urease activity assay or gram staining. s rrna sequencing one colony from the h. pylori or o. anthropic culture plate was suspended in μl of sterile water, and the dna was liberated by a boiling-freezing method. the s rrna gene was selectively amplified from this crude lysate by pcr using the universal primers f and r, which have been previously described (table s ). the pcr products at ~ bp were sequenced. the resultant s rrna sequences were compared with the standard nucleotide sequences deposited in genbank with the blast program (http://www.ncbi.nlm.nih.gov/blast/). the dna sequences of s rrna extracted from these strains were confirmed to be from h. pylori or o. anthropic. preparation of crude extracts from the h. pylori and o. anthropic strains for the urease activity assay for the urease activity assay, h. pylori or o. anthropic was cultured accordingly in .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ml of broth medium as described above. bacteria were centrifuged at , rpm for min, and the pellet was washed with phosphate-buffered saline (pbs, ph = . ). the pellet was resuspended in ml of pbs in the presence of protease inhibitors (sigma-aldrich, steinheim, germany) and then sonicated for min of cycles ( s run and s rest) using the noncontact ultrasonic rupture device (diagenode, liege, belgium). the resultant bacterial lysate was centrifuged twice at , rpm for min; the supernatant was collected and desalted using a sephadex g- desalting column (yeli, shanghai, china). the protein in the fractions was separated by % sds-page, and the corresponding protein band for urease was quantified to determine the concentration of ureases by coomassie blue r- (sinopharm, shanghai, china) staining using bovine serum albumin as a standard. the desalted fractions were stored at - °c in the presence of % glycerol until usage in the activity assay. size-exclusion chromatography for the purification of urease from h. pylori the crude extract from h. pylori was first centrifuged at , rpm for min. one milliliter of supernatant was loaded onto a gel filtration column ( mm × cm; ge healthcare) and eluted with pbs at a rate of . ml/min on an akta explorer fplc workstation (ge healthcare). the protein peaks observed were collected in eppendorf tubes in a volume between . and ml. the collected fractions were separated by page on a % tris-glycine sds-gel and stained with coomassie brilliant blue r- to identify h. pylori urease. determination of the minimal inhibition concentration and dose-dependent .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / growth-inhibition curve for urease inhibitors the minimal inhibition concentration (mic) and dose-dependent growth-inhibition curve for the inhibitors on h. pylori were determined using the broth dilution method( ). briefly, h. pylori was grown to an od nm of . in liquid medium supplemented with % fbs under standard culture conditions. then, μl h. pylori in the diluted culture (od of . ) was incubated with the inhibitors at final concentrations of , , , , , , , , μg/ml or at indicated concentrations for h. the od nm was measured to calculate the percentage of growth inhibition. the dmso ( % final concentration)-treated h. pylori cultures and culture medium in the absence of bacteria were referred as the negative control ( %) and positive control ( %), respectively. the mic was defined as the lowest concentration of inhibitor that inhibited % of bacterial growth. the h. pylori strain was found to be resistant to tinidazole or metronidazole and have an mic of greater than g/ml. bacterial-cell-based assay for measuring the activity of urease in culture the endogenous activity of hpu in bacterial cultures was determined using the tandem-well-based plate. briefly, μl of h. pylori culture (od nm ~ . ) was treated with panobinostat, dacinostat or ebs as well as ebs analogs for or h at different concentrations ( , . , . , . , , , or μm). then, the bacterial cells were centrifuged, washed and resuspended in assay buffer containing mm urea. finally, the ~ l suspension was added to the reaction well of the tandem-well plate and assessed for the activity of urease with nessler’s reagent under standard assay .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / conditions. gastric cell infection model of h. pylori the cell infection model of h. pylori was constructed using the sgc- adenocarcinoma gastric cell line and following an established protocol( ). briefly, h. pylori was cultured in liquid medium for h. pylori at °c for - days under standard culture conditions (see above). then, h. pylori at a concentration of .  cfu/ml was treated with the indicated inhibitors for h in culture. the bacterial suspension together with mm urea were subsequently added to the culture medium of sgc- cells (moi = ), which had been cultured with rpmi medium plus % fbs in a -well plate for one day, and coincubated with the cells for an additional h. cell images were obtained at specific time points prior to and one day after addition of the bacterial culture using image xpress micro® xls (molecular devices, sunnyvale, ca) under a  objective lens. the cell numbers in the images were quantified using image xpress software. the protective effects of the inhibitors were calculated by dividing the number of sgc- cells after the -h treatment by that prior to the treatment ( %) in the same well. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / data availability all data are contained within the manuscript. conflict of interest the authors declare no conflicts of interest. acknowledgements we thank david sullivan, jun liu and curtis chong of johns hopkins university for providing the johns hopkins clinical compound library. we thank prof. s.c. tao (shanghai center for systems biomedicine, shanghai jiao tong university, shanghai, china) for kindly providing the sgc- cell line. we thank dr. j.r. xu (department of radiology, ren ji hospital, school of medicine, shanghai jiao tong university, shanghai, china) for assisting with the surface plasmon resonance assay experiment. funding this work was supported by the national natural science foundation of china ( , ), the natural science foundation of shanghai ( zr ), the shanghai foundation for the development of science and technology ( jc ), and the research fund of medicine and engineering of shanghai jiao tong university (yg qnb ). author contributions f.l., j.y., j.y.x., x.y.w. and f.w. designed the study, and analyzed the data. f.z.l. and y.x.z. synthesized analogs of ebs lead. y.y.z. constructed the assay and performed the high-throughput screening. h.q.f. and l.j.l. performed the lc-ms/ms analysis. q.l. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / and z.p.x. confirmed the inhibitory activity of compounds. s.s.h performed the molecular simulation. f.l., x.y.w. and f.w. wrote the paper. all authors reviewed the results and approved the final version of the manuscript. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . maroney, m. j., and ciurli, s. ( ) nonredox nickel enzymes. chemical reviews , - . ha, n. c., oh, s. t., sung, j. y., cha, k. a., lee, m. h., and oh, b. h. ( ) supramolecular assembly and acid resistance of helicobacter pylori urease. nature structural biology , - . mazzei, l., cianci, m., benini, s., and ciurli, s. ( ) the structure of the elusive urease-urea complex unveils the mechanism of a paradigmatic nickel-dependent enzyme. angewandte chemie , - . mobley, h. l., and hausinger, r. p. ( ) microbial ureases: significance, regulation, and molecular characterization. microbiological reviews , - . debowski, a. w., walton, s. m., chua, e. g., tay, a. c., liao, t., lamichhane, b., himbeck, r., stubbs, k. a., marshall, b. j., fulurija, a., and benghezal, m. ( ) helicobacter pylori gene silencing in vivo demonstrates urease is essential for chronic infection. plos pathogens , e . armbruster, c. e., forsyth-deornellas, v., johnson, a. o., smith, s. n., zhao, l., wu, w., and mobley, h. l. t. ( ) genome-wide transposon mutagenesis of proteus mirabilis: essential genes, fitness factors for catheter-associated urinary tract infection, and the impact of polymicrobial infection on fitness requirements. plos pathogens , e . dunn, b. e., campbell, g. p., perez-perez, g. i., and blaser, m. j. ( ) purification and characterization of urease from helicobacter pylori. the journal of biological chemistry , - . norsworthy, a. n., and pearson, m. m. ( ) from catheter to kidney stone: the uropathogenic lifestyle of proteus mirabilis. trends in microbiology , - . mora, d., and arioli, s. ( ) microbial urease in health and disease. plos pathogens , e . debraekeleer, a., and remaut, h. ( ) future perspective for potential h elicobacter pylori eradication therapies. future microbiol , - . macegoniuk, k., grela, e., palus, j., rudzinska-szostak, e., grabowiecka, a., biernat, m., and berlicki, l. ( ) , -benzisoselenazol- ( h)-one derivatives as a new class of bacterial urease inhibitors. journal of medicinal chemistry , - . diaz-sanchez, a. g., alvarez-parrilla, e., martinez-martinez, a., aguirre-reyes, l., orozpe-olvera, j. a., ramos-soto, m. a., nunez-gastelum, j. a., alvarado-tenorio, b., and de la rosa, l. a. ( ) inhibition of urease by disulfiram, an fda-approved thiol reagent used in humans. molecules , e . yu, x. d., zheng, r. b., xie, j. h., su, j. y., huang, x. q., wang, y. h., zheng, y. f., mo, z. z., wu, x. l., wu, d. w., liang, y. e., zeng, h. f., su, z. r., and huang, p. ( ) biological evaluation and molecular docking of baicalin and scutellarin as helicobacter pylori urease inhibitors. journal of ethnopharmacology , - . xiao, z. p., peng, z. y., dong, j. j., deng, r. c., wang, x. d., ouyang, h., yang, p., he, j., .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / wang, y. f., zhu, m., peng, x. c., peng, w. x., and zhu, h. l. ( ) synthesis, molecular docking and kinetic properties of beta-hydroxy-beta-phenylpropionyl-hydroxamic acids as helicobacter pylori urease inhibitors. european journal of medicinal chemistry , - . yang, x., koohi-moghadam, m., wang, r., chang, y. y., woo, p. c. y., wang, j., li, h., and sun, h. ( ) metallochaperone ureg serves as a new target for design of urease inhibitor: a novel strategy for development of antimicrobials. plos biology , e . malfertheiner, p., megraud, f., o'morain, c. a., atherton, j., axon, a. t., bazzoli, f., gensini, g. f., gisbert, j. p., graham, d. y., rokkas, t., el-omar, e. m., and kuipers, e. j. ( ) management of helicobacter pylori infection--the maastricht iv/ florence consensus report. gut , - . malfertheiner, p., megraud, f., o'morain, c. a., gisbert, j. p., kuipers, e. j., axon, a. t., bazzoli, f., gasbarrini, a., atherton, j., graham, d. y., hunt, r., moayyedi, p., rokkas, t., rugge, m., selgrad, m., suerbaum, s., sugano, k., and el-omar, e. m. ( ) management of helicobacter pylori infection-the maastricht v/florence consensus report. gut , - . graham, d. y., and shiotani, a. ( ) new concepts of resistance in the treatment of helicobacter pylori infections. nature clinical practice. gastroenterology & hepatology , - . pierce, c. w. h., e. l.; sawyer, d. t. ( ) quantitative analysis, john wiley & sons, new york . zhang, q., tang, x., hou, f., yang, j., xie, z., and cheng, z. ( ) fluorimetric urease inhibition assay on a multilayer microfluidic chip with immunoaffinity immobilized enzyme reactors. analytical biochemistry , - . t. t. ngo, a. p. h. p., c. f. yam, and lenhoff. ( ) interference in determination of ammonia with the hypochlorite-alkaline phenol method of berthelot. anal chem , - . tarsia, c., danielli, a., florini, f., cinelli, p., ciurli, s., and zambelli, b. ( ) targeting helicobacter pylori urease activity and maturation: in-cell high-throughput approach for drug discovery. biochimica et biophysica acta. general subjects , - . alonso, c. a., kwabugge, y. a., anyanwu, m. u., torres, c., and chah, k. f. ( ) diversity of ochrobactrum species in food animals, antibiotic resistance phenotypes and polymorphisms in the blaoch gene. fems microbiology letters . zhou, y., yu, j., lei, x., wu, j., niu, q., zhang, y., liu, h., christen, p., gehring, h., and wu, f. ( ) high-throughput tandem-microwell assay identifies inhibitors of the hydrogen sulfide signaling pathway. chemical communications , - . croppi, g., zhou, y., yang, r., bian, y., zhao, m., hu, y., ruan, b. h., yu, j., and wu, f. ( ) discovery of an inhibitor for bacterial -mercaptopyruvate sulfurtransferase that synergistically controls bacterial survival. cell chem biol , - . upvan narang, p. n. p., and frank v. bright. ( ) a novel protocol to entrap active urease in a tetraethoxysilane-derived sol-gel thin-film architecture. chem. mater. , - . bloomster, t. g., and lynn, r. j. ( ) effect of antibiotics on the dynamics of color change in ureaplasma urealyticum cultures. journal of clinical microbiology , - .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . skrott, z., mistrik, m., andersen, k. k., friis, s., majera, d., gursky, j., ozdian, t., bartkova, j., turi, z., moudry, p., kraus, m., michalova, m., vaclavkova, j., dzubak, p., vrobel, i., pouckova, p., sedlacek, j., miklovicova, a., kutt, a., li, j., mattova, j., driessen, c., dou, q. p., olsen, j., hajduch, m., cvek, b., deshaies, r. j., and bartek, j. ( ) alcohol-abuse drug disulfiram targets cancer via p segregase adaptor npl . nature , - . krippendorff, b. f., neuhaus, r., lienau, p., reichel, a., and huisinga, w. ( ) mechanism-based inhibition: deriving k(i) and k(inact) directly from time-dependent ic( ) values. journal of biomolecular screening , - . lieberman, o. j., orr, m. w., wang, y., and lee, v. t. ( ) high-throughput screening using the differential radial capillary action of ligand assay identifies ebselen as an inhibitor of diguanylate cyclases. acs chemical biology , - . goldie, j., veldhuyzen van zanten, s. j., jalali, s., richardson, h., and hunt, r. h. ( ) inhibition of urease activity but not growth of helicobacter pylori by acetohydroxamic acid. journal of clinical pathology , - . singh, n., halliday, a. c., thomas, j. m., kuznetsova, o. v., baldwin, r., woon, e. c., aley, p. k., antoniadou, i., sharp, t., vasudevan, s. r., and churchill, g. c. ( ) a safe lithium mimetic for bipolar disorder. nature communications , . chari, a., cho, h. j., dhadwal, a., morgan, g., la, l., zarychta, k., catamero, d., florendo, e., stevens, n., verina, d., chan, e., leshchenko, v., lagana, a., perumal, d., mei, a. h., tung, k., fukui, j., jagannath, s., and parekh, s. ( ) a phase study of panobinostat with lenalidomide and weekly dexamethasone in myeloma. blood advances , - . nussinov, r., and tsai, c. j. ( ) the design of covalent allosteric drugs. annual review of pharmacology and toxicology , - . hancock, r. e. ( ) peptide antibiotics. lancet , - . zhang, j. h., chung, t. d., and oldenburg, k. r. ( ) a simple statistical parameter for use in evaluation and validation of high throughput screening assays. journal of biomolecular screening , - . irwin, j. j., and shoichet, b. k. ( ) docking screens for novel ligands conferring new biology. journal of medicinal chemistry , - . wei, w., mao, a., tang, b., zeng, q., gao, s., liu, x., lu, l., li, w., du, j. x., li, j., wong, j., and liao, l. ( ) large-scale identification of protein crotonylation reveals its role in multiple cellular functions. journal of proteome research , - . maier, j. a., martinez, c., kasavajhala, k., wickstrom, l., hauser, k. e., and simmerling, c. ( ) ff sb: improving the accuracy of protein side chain and backbone parameters from ff sb. j chem theory comput , - . palacios-espinosa, j. f., arroyo-garcia, o., garcia-valencia, g., linares, e., bye, r., and romero, i. ( ) evidence of the anti-helicobacter pylori, gastroprotective and anti-inflammatory activities of cuphea aequipetala infusion. journal of ethnopharmacology , - .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure development of a new high-throughput assay for urease and the discovery of new urease inhibitors. (a) diagram of the tandem-well-based assay for the nh -producing enzyme. the procedures for the assays and the cross-section of a tandem-well are shown. blue, the reaction reagent; red, the detection reagent for nh . (b) validation of the urease assay with the known inhibitor aha. (c) well-to-well reproducibility of the -tandem-well-based assay for urease. ●, % dmso .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / (control, %); ■, μm aha; ▲, μm aha (n = ). (d) high-throughput inhibitor screening for jbu with -tandem-well plates. compound concentration: m. (e-f) dose-dependent effects of panobinostat, dacinostat, ebs, captan and disulfiram on the activity of jbu (e), human cbs (f) or human cse (f). means ± sds (n = ). all experiments except the primary screening (d) were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure panobinostat, dacinostat, ebs and captan inhibit the activity of jbu. (a) panobinostat and dacinostat are reversible inhibitors, whereas ebs and captan are covalent inhibitors or slow-binding inhibitors toward jbu. means ± sds (n = ). (b) effects of the incubation period on the ic values of panobinostat and dacinostat toward jbu. panobinostat and dacinostat were preincubated with jbu for the indicated times before performing the standard assay to analyze their inhibitory effects. means ± sds (n = ). (c) inhibition of jbu by panobinostat or dacinostat as a function of urea concentration. ki values for panobinostat and dacinostat, . μm and . μm, respectively. means ± sds (n= ). (d) surface plasmon resonance assay analysis of the binding of panobinostat or dacinostat to jbu. kd were calculated using biacore evaluation software and listed in table . (e) the putative binding mode of panobinostat or dacinostat in the jbu active site. panobinostat and dacinostat were docked into the jbu crystal structure (pdb code: goa) using the discovery studio software. residues surrounding the inhibitor within a distance of . Å are shown in gray; and hydrogen bonds are represented as green dotted lines. the experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure ebs or captan allosterically inhibits the activity of urease by covalently modifying a non-active-site cys residue. (a) the synergistic inhibitory effects of the combinations of ebs, captan or aha. a dose-dependent synergistic effect of the combination of ebs at the indicated concentrations with m captan was observed (right panel). data are presented as percentages of the controls (dmso and m captan alone in the left panel and right panel, respectively, %). means ± sds (n= ). (b) .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / inhibition of jbu by ebs or captan as a function of the urea concentration. αki for ebs and captan, . μm and . μm, respectively. means ± sds (n= ). (c) tandem mass spectrometry analysis of the modification site of ebs and captan on jbu. the cys modification of ebs and captan on jbu were illustrated in the right panels. (d) surface plasmon resonance assay analysis of the binding of ebs or captan to jbu. (e) the potential binding modes of ebs and captan in jbu. ebs and captan were modeled into the respective allosteric sites presented in the crystal structure of jbu (pdb code: goa; methods). the residues within . Å surrounding the ebs and captan are shown. hydrogen bonds are indicated as dashed green lines. the experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure urease inhibitors suppress bacterial ureases or the growth of urease-containing bacteria. (a) dose-dependent effects of panobinostat, dacinostat, ebs, captan, disulfiram and aha on the activity of h. pylori urease (hpu, upper panel) or o. anthropic urease (oau, lower panel) in vitro. (b) panobinostat, dacinostat, ebs, captan and disulfiram inhibit the activity of purified hpu from size-exclusion chromatography. chromatography of the purification is shown in the left panel. the collected fractions (numbers - ) of the peaks (left panel), as well as the crude extract (number ), were separated by % sds-page and stained with coomassie brilliant blue r- (middle panel). the arrows indicate the peak of h. pylori urease (left panel) or subunit a or b of h. pylori urease (middle panel). the collected sample containing the urease (number ) was tested to evaluate the inhibitory effects of indicated compounds (right panel). the protein identity of fraction was .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / analyzed by lc-ms/ms (methods and figure s ). (c) the inhibitory effects of panobinostat, dacinostat and newly synthesized ebs analogs ( , and ) on the activity of hpu in culture. inhibitors were incubated with the h. pylori bacteria for h. (d) the effects of panobinostat, dacinostat, ebs and its derivatives on the growth of h. pylori. mean ± sd (n= ). all experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure panobinostat, dacinostat and ebs inhibits the virulence of h. pylori in cultured gastric .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cells. sgc- cells were infected with hp in the presence of m panobinostat (a), m dacinostat (a), m aha (a), m ebs (b), m disulfiram or m tinidazole (b) for h before capturing the images in bright field by image xpress micro® xls (molecular devices, sunnyvale, ca) under a × objective lens. a representative image for each treatment condition is shown (n = ). scale bars, m. the cell numbers before treatment ( %) or after h of treatment were quantified. (c) the effects of urease inhibitors on the nh amount of the cell culture medium. after the treatment, the amount of nh in the cell medium of the corresponding samples was quantified with nessler’s reagent, and the data are shown as percentages of the control (dmso, %). means ± sds (n= ). statistical analyses were performed using the raw data by one-way anova with bonferroni posttests. n.s., no significance; *, p< . ; **, p< . ; ***, p < . . all experiments were independently repeated twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . structure-activity relationships of panobinostat, dacinostat, ebs and captan. (a) the effects of commercially available analogs of panobinostat and dacinostat, newly synthesized ebs .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / derivatives and commercially available ebs or captan analogs on the activity of jbu. dmso, %. mean ± sd (n= ). the experiments were independently repeated at least twice, and one representative result is presented. (b) the illustration charts for the structure-activity relationships of hydroxamic acid analogs, ebs or captan. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / table indication, chemical structure, ic , αki, or kd values of urease inhibitors. afrom the enzyme kinetic study bassay was performed in mm tris buffer (ph= . ). name application structure ic (m); jbu ic (m); hpu ic (m); oau αki or ki (m)a ic (m); hcbs ic (m); hcse kd (m) panobinostat anticancer n h hn nh o oh . ± . . ± . . ± . . ± . > . > . . ± . dacinostat anticancer n h n nh o oh ho . ± . . ± . . ± . . ± . > . > . . ± . ebselen anti-stroke; anti-bipolar se n o . ± . . ± . . ± . . ± . > . . ± . . ± . captan pharmaceutical excipient; fungicide n o o s ccl . ± . . ± . . ± . . ± . > . > . . ± . disulfiram alcohol deterrent ch ch ns sn h c h c s s . ± . . ± . . ± . - > . > . - acetohydrox amic acid urinary tract infections no oh . ± . . ± . b . ± . . ± . . ± . > . > . - .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s supplementary information high-throughput tandem-microwell assay for ammonia repositions fda-approved drugs to helicobacter pylori infection fan liu,a,b,# jing yu,b,# yan-xia zhang,c fangzheng li,a, d qi liu,e yueyang zhou,a shengshuo huang,b houqin fang,f zhuping xiao,e lujian liao,f jinyi xu,d xin-yan wu,c fang wu a,* akey laboratory of systems biomedicine (ministry of education), shanghai center for systems biomedicine, shanghai jiao tong university, shanghai, , china bstate key laboratory of microbial metabolism, sheng yushou center of cell biology and immunology, school of life science and biotechnology, shanghai jiao tong university, shanghai, , china cschool of chemistry & molecular engineering, east china university of science and technology, shanghai, , china. dstate key laboratory of natural medicines and department of medicinal chemistry, china pharmaceutical university, nanjing, , china ehunan engineering laboratory for analyse and drugs development of ethnomedicine in wuling mountains, jishou university, hunan, , china fshanghai key laboratory of regulatory biology, school of life sciences, east china normal university, shanghai, , china. #these authors contributed equally to this work. *to whom correspondence may be addressed. emails: fang.wu@sjtu.edu.cn .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s table of contents experimental procedures .............................................................................. s figure s . development and optimization of the high-throughput assay for urease. .... s figure s . validation of on-target inhibition of panobinostat, dacinostat, ebs, captan and disulfiram on jbu. .......................................................................................................... s figure s . the mode of action of panobinostat, dacinostat and disulfiram in vitro . .... s figure s . the mode of action of ebs and captan in vitro. .......................................... s figure s . the identification of hpu from extracts of h. pylori by lc-ms/ms. ........ s figure s . ebs and is a long-acting inhibitor for hpu in culture. ............................. s figure s . the effects of inhibitors on the cell viability of gastric sgc- cells and antibiotic resistance of the h. pylori strain. .................................................................... s figure s . the binding modes of inhibitors in ureases. ................................................. s table s . chemical structures and ic values of ebs or captan analogs for ureases ......... ......................................................................................................................................... s table s . the minimal inhibitory concentration of urease inhibitors or known antibiotics for inhibiting h. pylori and their ic values in the in cellulo urease assay ................... s table s . chemical structures and ic values of hydroxamic acid-based analogs for ureases ............................................................................................................................. s table s . primer sequences. ........................................................................................... s reference ....................................................................................................................... s .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s experimental procedures synthesis of ebs analogs - compound - were synthesized according to literature procedure( - ), as shown in scheme s . the chemical reagents and solvents are purchased from commercial sources, and used without further purification, unless stated otherwise. h nmr spectra for these compounds were recorded with bruker spectrometer. the chemical shifts of h nmr spectra were referenced to tetramethylsilane (δ . ppm). scheme s . synthesis of compounds - . reagents and conditions: (a) hcl, nano , oc, . h; (b) na se , oc, h; (c) socl , .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s oc, h; (d) r nh , et n, ch cl , rt, . h; (e) br , ch cl , reflux, overnight; (f) cu(no ) .xh o, et n, toluene, reflux. general procedure for synthesis of compounds , , and (route a). the -aminobenzoic acid or its derivative was treated with hydrochloric acid ( . equiv.) and sodium nitrite ( . equiv.) in water ( . m) at °c to form the corresponding diazonium salt. then, the diazonium salt solution was added dropwise to a solution of na se ( . equiv., fresh prepared from selenium powder and nabh in water) at °c under argon. the stirring was continued at °c for h. after work-up, crude , ’-diseleno-dibenzoic acid was obtained. sequentially, the acid was further converted to -(chloroseleno)benzoyl chloride with excess socl and one drop of dmf at oc for h. after the removal of thionyl chloride, the crude compound was obtained, and which was treated with different amines ( . equiv.) and et n ( . equiv.) in ch cl ( . m) under argon to afford products and - , respectively. silica gel column chromatography was used to purify these compounds, and their hplc purity was more than %. -phenyl- -methoxybenzoisoselen- -one ( ) -methoxy- -aminobenzoic acid and aniline were used to give the compound. h nmr ( mhz, cdcl ): δ . (d, j = . hz, h), . (dd, j = . , . hz, h), . (t, j = . hz, h), . - . (m, h), . (d, j = . hz, h), . (dd, j = . , . hz, h), . (s, h). ms (m/z): . [m+h]+. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s benzisoselenol- -one ( ) o-aminobenzoic acid and ammonia were used to give the product. h nmr ( mhz, d -dmso): δ . (br, h), . (d, j = . hz, h), . (dd, j = . , . hz, h), . (td, j = . , . hz, h), . (td, j = . , . hz, h). ms (m/z): . [m+h]+. -propyl-benzisoselenol- -one ( ) o-aminobenzoic acid and n-propylamine were used to give the product. h nmr ( mhz, cdcl ) δ . (d, j = . hz, h), . (d, j = . hz, h), . (td, j = . , . hz, h), . - . (m, h), . (t, j = . hz, h), . (hex, j = . hz, h), . (t, j = . hz, h). ms m/z: . [m+h]+. -methylthio-benzisoseleno- -one ( ) o-aminobenzoic acid and thiourea were used to give the product. h nmr ( mhz, d -dmso): δ . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (d, j = . hz, h), . (td, j = . , . hz, h), . (t, j = . hz, h). ms (m/z): . [m-nh ] -. synthesis of compound . compound was prepared according to route b (scheme s ). , '-dithiobis-benzoic acid was reacted with bromine in ch cl under reflux and argon, and then treated with aniline and et n in ch cl at room temperature. after purified the crude product by column chromatography, compound was obtained. h nmr ( mhz, cdcl ): δ . (d, j = . hz, h), . - . (m, h), . - . (m, h), . - . (m, h), . (d, j = .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s . hz, h), . (t, j = . hz, h). ms (m/z): . [m]+. synthesis of compound . compound was synthesized according to route c (scheme s ). a schlenk tube equipped with a stirrer bar was charged with isoindoline- , -dione, diphenyliododnium salt ( . equiv.) and cu(no ) .xh o ( . equiv.) in dry toluene ( . m) under argon. the mixture was heated to °c, followed by the addition of et n ( . equiv.). after stirring at °c for . h (monitoring by tlc), the resulting mixture was continued stirring at room temperature overnight. then, the mixture was concentrated and the residue was purified by column chromatography. h nmr ( mhz, cdcl ): δ . - . (m, h), . (dd, j = . , . hz, h), . - . (m, h), . - . (m, h). ms (m/z): . [m]+. hplc method and purity analysis the purity of compounds - , ebselen oxide or dibenzyl diselenide was analyzed on a waters sunfire silica column ( . × mm; waters, milford, ma), which is coupled to a waters hplc system (e ). l compound was injected onto the column and separated by a gradient elution [ min: % phase a (hexane), % phase b (isopropyl alcohol); min: % phase a (hexane), % phase b (isopropyl alcohol)] at a flow rate of . ml/min under room temperature. similarly, the purity of compound was resolved on a waters pherisorb cn column .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s ( . × mm, waters). l compound was injected onto the column and analyzed at a flow rate of . ml/min with an isocratic elution of solvent, which is composed of % hexane and % isopropyl alcohol. the absorbance of the compounds were monitored at a wavelength of nm, and the corresponding spectra were recorded and analyzed for the determination of the purity. the purity of ebs analogs, which were newly synthesized in house (compound - ) or obtained from commercial sources (for ebselen oxide and dibenzyl diselenide), were analyzed by hplc (for details, see above). compound determined purity: > %; retention time: . min .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s compound determined purity: > %; retention time: . min compound determined purity: > %; retention time: . min .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s compound determined purity: > %; retention time: . min compound determined purity: > %; retention time: . min .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s compound determined purity: > %; retention time: . min .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s ebselen oxide (cayman) determined purity: %; retention time: . min ddibenzyl diselenide (cayman) determined purity: > %; retention time: . min .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . development and optimization of the high-throughput assay for urease. three types of detection reagents, i.e., salicylic acid-hypochlorite (a), nessler’s reagent (b), and phenol red (c), were used to detect the released nh generated by jbu. the .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s assay was monitored in the presence of various concentrations of jbu and mm urea. the absorbance (o.d.) values at nm, nm or nm were recorded accordingly. (d) standard curve of the absorbance of indophenol blue at nm versus the nh cl concentration. various concentrations of nh cl were mixed with the detection reagent salicylic acid-hypochlorite before measurement of the absorbance at nm in a microplate reader. (e) the ph profile of the activity of jbu. the mm phosphate buffer (■) was used to maintain the ph between and , and mm tris-hcl (●) was used for ph to . jbu was dissolved in the respective buffers and assayed at a final concentration of nm. (f-g) the comparison between salicylic acid-hypochlorite and nessler’s detection reagent for the detection of hpu activity. the assay was performed to detect the urease activity in the extract from h. pylori with salicylic acid-hypochlorite (left panel) and nessler’s detection reagent (right panel) in the presence of mm urea. data are presented as the mean ± sd (n= ). the curves were fitted to the data points with graphpad prism . all the experiments were independently repeated twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . validation of on-target inhibition of panobinostat, dacinostat, ebs, captan and disulfiram on jbu. (a) nh did not interfere with the inhibitors. mm nh ·h o was incubated with various concentration of panobinostat, dacinostat, ebs, captan or disulfiram in assay buffer. the volatile nh was analyzed with salicylic acid-hypochlorite detection reagent (od nm). (b) triton x- did not affect either the .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s activity of jbu or the inhibition potency of panobinostat, dacinostat, ebs, captan or disulfiram as well as ebs analogs. various concentrations of triton x- were tested for their effects on the activity of jbu. additionally, the indicated concentrations of panobinostat, dacinostat, ebs, ebs oxide, captan, , , or disulfiram were assayed in the presence or absence of / triton x- (v/v) to determine whether their inhibitory mechanisms occurred via colloidal aggregation (methods)( ). the results are shown as percentages of the respective control (dmso or h o, %). mean ± sd (n= ). all experiments were independently repeated twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . the mode of action of panobinostat, dacinostat and disulfiram in vitro. (a) the effect of nicl on the inhibition of jbu by panobinostat or dacinostat. nicl at a concentration of , or m was added into the assay that is with the various concentrations of panobinostat or dacinostat under standard assay conditions. (b) effects of cysteine and histidine on the inhibition of jbu with panobinostat and dacinostat. the assay samples were incubated with the indicated concentrations of panobinostat or .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s dacinostat in the presence or the absence of m cys or m his. the results are shown as percentages of the control (dmso, %). (c) reversibility of the inhibition of jbu by disulfiram. after incubation with jbu at , μm for min, disulfiram was diluted -fold in assay buffer. the diluted concentrations for disulfiram are μm and . μm, respectively, which do not inhibit jbu (fig. e). after a further incubation for . h, the remaining activity of jbu was measured accordingly (methods). and the effect of nicl on the inhibition of jbu by disulfiram was shown on the right panel. the results are shown as percentages of the respective control (dmso, %). mean ± sd (n= ). all experiments were independently repeated twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . the mode of action of ebs and captan in vitro. (a) effects of dithiothreitol on the inhibition of jbu caused by ebs and captan. the assay was incubated with m ebs or m captan in the presence or the absence of mm dtt. (b) effects of cysteine and histidine on the inhibition of jbu by ebs and captan. the samples were incubated with the indicated concentrations of ebs or captan in the presence or absence of m cys or m his. (c) the effect of nicl on the inhibition of ebs by jbu. nicl at a concentration of . , , or m was incubated with the various concentrations of ebs under standard assay conditions. (d) the ic values of ebs and captan toward jbu were linearly correlated with the concentrations of jbu. ebs and captan were incubated with various concentrations of jbu, and the ic values were determined accordingly. (e) the inhibition constants of ki or kinact for irreversible inhibitors were determined according to the methods described in ref. ( ). means ± sds (n= ). all experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . the identification of hpu from extracts of h. pylori by lc-ms/ms. fraction collected by size-exclusion chromatography (figure b) was digested with trypsin, glucand subtilisin, separated from the c reverse-phase column and subjected to analysis with a thermo q exactive orbitrap (thermo fisher scientific). the peptides in red were identified by lc-ms/ms as subunit a or b of h. pylori. the overall coverage of ureb and urea identified in the analysis of lc-ms/ms was . % and . %, respectively. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . ebs and is a long-acting inhibitor for hpu in culture. (a) disulfiram dose-dependently and selectively inhibits the growth of h. pylori. various concentrations of disulfiram were incubated at °c with h. pylori. (b) the inhibitory effects of ebs and on the activity of hpu in cellulo. ebs, or aha at a concentration of m were incubated with h. pylori bacteria for h. additionally, one batch of the treated bacteria was washed, diluted into freshly prepared medium without the addition of the inhibitors, and cultured for an additional h. the in cellulo urease activities from the cultured cells under the two treated-conditions were determined accordingly (methods). the results are shown as percentages of the control (dmso, %). mean ± sd (n= ). all experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . the effects of inhibitors on the cell viability of gastric sgc- cells and antibiotic resistance of the h. pylori strain. (a) the h. pylori strain is resistant to treatment with tinidazole or metronidazole. various concentrations of tinidazole or metronidazole were incubated at °c with h. pylori for h under standard culture conditions, and the od at nm was recorded using a spectrophotometer to determine .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s the cell growth of h. pylori (methods). (b) the effects of urease inhibitors on the viability of mammalian cells. sgc- cells were incubated with dmso, the indicated concentrations of panobinosta, dacinostat, ebs or disulfiram for h in a -well plate before measurement of cell viability using the celltiter ® aqueous one solution cell proliferation assay (promega, madison, wi). the results are shown as percentages of the control (dmso, %). means ± sds (n= ). all experiments were independently repeated at least twice, and one representative result is presented. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s figure s . the binding modes of inhibitors in ureases. (a) the putative binding mode of panobinostat (black) or dacinostat (black) in the hpu active site. panobinostat and .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s dacinostat were docked into the hpu crystal structure (pdb code e y; ref. ( )) using the discovery studio software. residues surrounding the inhibitor within a distance of . Å are shown in gray or in the default atom color. (b) global view of the binding region of ebs (upper panel) and captan (lower) in jbu. in the modeled ebs or captan and protein complex structure (methods and figure e), the protein is shown in black, the key residues (his and his ) in the active site of jbu in cyan and the inhibitors as well as its attached cys residue (cys for ebs, cys for captan; figure e) in red. hydrogen bonds are represented as green dotted lines. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s table s . chemical structures and ic values of ebs or captan analogs for ureases. name structure ic (m); hpu ic (m); jbu ic (m); oau se n o o . ± . . ± . . ± . s n o > . . ± . . ± . n o o > . > . > . se nh o . ± . . ± . . ± . se n o > . . ± . . ± . se n o nh s . ± . . ± . . ± . ebselen oxide se n o o . ± . . ± . . ± . dibenzyl diselenide se se > . > . > . captafol n o o s cl cl cl cl > . . ± . . ± . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s table s . the minimal inhibitory concentration of urease inhibitors or known antibiotics for inhibiting h. pylori and their ic values in the in cellulo urease assay. compound h. pylori (mic) h. pylori (ic values in the in cellulo urease assay; m) g/ml m ebs . . ± . . . ± . . . ± . . . ± . ebs oxide . . ± . captan . ± . disulfiram . . ± . dibenzyl diselenide > > > . aha > > - tinidazole > > - metronidazole > > - mic: minimal inhibitory concentration .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s table s . chemical structures and ic values of hydroxamic acid-based analogs for ureases. name structure ic (m); hpu ic (m); jbu abexinostat o o n h o hn o oh n . ± . . ± . belinostat o hn s o o nh ho . ± . . ± . vorinostat o nh o hn ho . ± . . ± . ricolinostat n no nh o hn oh n > . > . ilomastat o nh o hn o hn n h oh > . > . pracinostat o hn oh n n n > . > . hydroxylamine h n oh > . > . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s table s . primer sequences. no. primer usage '- agagtttgatcctggctcag- ' ' primer for s rrna '- aaggaggtgatccagccgca- ' ' primer for s rrna '- attaatcattagatgtatggccctactacaggcg- ' ' primer for ureb '- aatatactcgagctagaaaatgctaaagagttg- ' ' primer for ureb .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / s reference . pacula, a. j., obieziurska, m., scianowski, j., kaczor, k. b., and antosiewicz, j. ( ) water-dependent synthesis of biologically active diaryl diselenides. arkivoc, - . ngo, h. x., shrestha, s. k., green, k. d., and garneau-tsodikova, s. ( ) development of ebsulfur analogues as potent antibacterials against methicillin-resistant staphylococcus aureus. bioorgan med chem , - . lucchetti, n., scalone, m., fantasia, s., and muniz, k. ( ) sterically congested , -disubstituted anilines from direct c-n bond formation at an iodine(iii) center. angew chem int edit , - . irwin, j. j., and shoichet, b. k. ( ) docking screens for novel ligands conferring new biology. journal of medicinal chemistry , - . krippendorff, b. f., neuhaus, r., lienau, p., reichel, a., and huisinga, w. ( ) mechanism-based inhibition: deriving k(i) and k(inact) directly from time-dependent ic( ) values. journal of biomolecular screening , - . ha, n. c., oh, s. t., sung, j. y., cha, k. a., lee, m. h., and oh, b. h. ( ) supramolecular assembly and acid resistance of helicobacter pylori urease. nature structural biology , - .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / insights into genome recoding from the mechanism of a classic + -frameshifting trna insights into genome recoding from the mechanism of a classic + -frameshifting trna howard gamper , , haixing li , , isao masuda , d. miklos robkis , thomas christian , adam b. conn , gregor blaha , e. james petersson , ruben l. gonzalez, jr ,#, and ya-ming hou ,#,* department of biochemistry and molecular biology, thomas jefferson university, philadelphia, pa , usa department of chemistry, columbia university, new york, ny , usa department of chemistry, university of pennsylvania, philadelphia, pa , usa department of biochemistry, university of california, riverside, ca , usa these authors contributed equally to this work. #corresponding authors: rlg @columbia.edu (t) - - ; (f) - - ; orcid: - - - ya-ming.hou@jefferson.edu (t) - - ; (f) - - ; orcid: - - - *lead contact: ya-ming hou (ya-ming.hou@jefferson.edu) running title: mechanism of sufb -induced + frameshifting .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract while genome recoding using quadruplet codons to incorporate non-proteinogenic amino acids is attractive for biotechnology and bioengineering purposes, the mechanism through which such codons are translated is poorly understood. here we investigate translation of quadruplet codons by a + -frameshifting trna, sufb , that contains an extra nucleotide in its anticodon loop. natural post-transcriptional modification of sufb in cells prevents it from frameshifting using a quadruplet-pairing mechanism such that it preferentially employs a triplet-slippage mechanism. we show that sufb uses triplet anticodon-codon pairing in the -frame to initially decode the quadruplet codon, but subsequently shifts to the + -frame during trna-mrna translocation. sufb frameshifting involves perturbation of an essential ribosome conformational change that facilitates trna-mrna movements at a late stage of the translocation reaction. our results provide a molecular mechanism for sufb -induced + frameshifting and suggest that engineering of a specific ribosome conformational change can improve the efficiency of genome recoding. key words: sufb frameshift suppressor trna, + ribosomal frameshifting, quadruplet codon, genome expansion, m g methylation .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction the ability to recode the genome and expand the chemical repertoire of proteins to include non-proteinogenic amino acids promises novel tools for probing protein structure and function. while most recoding employs stop codons as sites for incorporating non-proteinogenic amino acids, only two stop codons can be simultaneously recoded due to the cellular need to reserve the third stop codon for termination of protein synthesis. the use of quadruplet codons as additional sites for incorporating non-proteinogenic amino acids has thus emerged as an attractive alternative , . recoding at a quadruplet codon requires a + -frameshifting trna that is aminoacylated with the non-proteinogenic amino acid of interest. the primary challenge faced by this technology has been the low efficiency with which the full-length protein carrying the non- proteinogenic amino acid can be synthesized. one reason for this is the poor recoding efficiency of the + -frameshifting aminoacyl (aa)-trna, and the second is the failure of the + -frameshifting aa-trna to compete with canonical aa-trnas that read the first three nucleotides of the quadruplet codon at the ribosomal aa-trna binding (a) site during the aa-trna selection step of the translation elongation cycle. while directed evolution by synthetic biologists has yielded + - frameshifting trnas, efficient recoding requires cell lines that have been engineered to deplete potential competitor trnas - . these problems emphasize the need to better understand the mechanism through which quadruplet codons are translated by + -frameshifting trnas. in bacteria, + -frameshifting trnas that suppress single-nucleotide insertion mutations that shift the translational reading frame to the + -frame have been isolated from genetic studies , . these + -frameshifting trnas typically contain an extra nucleotide in the anticodon loop – a property that has led to the proposal of two competing models for their mechanism of action. in the quadruplet-pairing model, the inserted nucleotide joins the triplet anticodon in pairing with the quadruplet codon in the a site and this quadruplet anticodon-codon pair is translocated to the ribosomal peptidyl-trna binding (p) site . in the triplet-slippage model, the expanded anticodon loop forms an in-frame ( -frame) triplet anticodon-codon pair in the a site and subsequently shifts .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / to the + -frame at some point later in the elongation cycle , , possibly during translocation of the + -frameshifting trna from the a to p sites or within the p site . the triplet-slippage model is supported by structural studies of ribosomal complexes in which the expanded anticodon-stem- loops (asls) of + -frameshifting trnas have been found to use triplet anticodon-codon pairing in the -frame at the a site - and in the + -frame at the p site . nonetheless, these structures do not eliminate the possibility that two competing triplet pairing schemes ( -frame and + -frame) can co-exist when a quadruplet codon motif occupies the a site , that some amount of + frameshifting can occur via the quadruplet-pairing model, and that the quadruplet-pairing model may even dominate for particular + -frameshifting trnas, codon sequences, and/or reaction conditions . we also do not know how each model determines the efficiency of + frameshifting or whether any competition between the two models is driven by the kinetics of frameshifting or the thermodynamics of base pairing. in addition, virtually all natural trnas contain a purine at nucleotide position on the '-side of the anticodon (http://trna.bioinf.uni-leipzig.de/), which is invariably post-transcriptionally modified and is important for maintaining the translational reading frame in the p site . while most + -frameshifting trnas sequenced to date also contain a purine nucleotide at position , we do not know whether it is post-transcriptionally modified or how the modification affects + frameshifting. perhaps most importantly, while the structural studies described above provide snapshots of the initial and final states of + frameshifting, they do not reveal where, when, or how the shift occurs, thereby precluding an understanding of the structural basis and mechanism of + frameshifting. these open questions have limited our ability to increase the efficiency of genome recoding at quadruplet codons. to address these questions, we have investigated the mechanism of + frameshifting by sufb (figure a), a + -frameshifting trna that was isolated from salmonella typhimurium as a suppressor of a single c insertion into a proline (pro) ccc codon . the observed high + - frameshifting efficiency of sufb at the ccc-c motif, nearly -fold above background , demonstrates its ability to successfully compete with the naturally occurring prol and prom .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / isoacceptor trnas that read the ccc codon. using the ensemble ‘codon-walk’ methodology and single-molecule fluorescence resonance energy transfer (smfret), we have compared the + frameshifting activity of sufb relative to its closest counterpart, prol, at a ccc-c motif, and determined the position and timing of the shift. our results show that sufb is naturally n - methylated at g in cells, generating an m g that blocks quadruplet pairing and forces sufb to use -frame triplet anticodon-codon pairing to decode the quadruplet codon at the a site. additionally, we find that sufb , and likely all + -frameshifting trnas, shifts to the + -frame during the subsequent translocation reaction in which the translational gtpase elongation factor (ef)-g catalyzes the movement of sufb from the a to p sites (i.e., a triplet-slippage mechanism). more specifically, we show that this frameshift occurs in the later steps of translocation, during which ef-g catalyzes a series of conformational rearrangements of the ribosomal pre-translocation (pre) complex that enable the trna asls and their associated codons to move to their respective post-translocation positions within the ribosomal small ( s in bacteria) subunit - . thus, efforts to increase the recoding efficiency of + -frameshifting trnas should focus on enforcing a triplet anticodon-codon pairing in the -frame at the a site and directed evolution to optimize conformational rearrangements of the ribosomal pre complex during the late stages of translocation. results native-state sufb is n -methylated at g and is readily aminoacylated with pro sufb contains an extra g a nucleotide inserted between g and u of prol (figure a). whether the extra g a is methylated and how it affects methylation of g is unknown. we thus determined the methylation status of the g -g a motif using rnase t cleavage inhibition assays and primer extension inhibition assays. we first generated a plasmid-encoded sufb by inserting g a into an existing tac-inducible plasmid encoding escherichia coli prol , which has an identical sequence to s. typhimurium prol. we then expressed and purified the plasmid- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / encoded sufb and prol from an e. coli prol knock-out (prol-ko) strain containing all the endogenous enzymes necessary for processing sufb and prol to their s. typhimurium native states such that they possess the full complement of naturally occurring post-transcriptional modifications (termed the native-state trnas). in addition, we prepared in vitro transcripts of sufb and prol lacking all post-transcriptional modifications (termed the g -state trnas), or enzymatically methylated with purified e. coli trmd , such that they possess only the n - methylation at g and no other post-transcriptional modifications (termed the m g -state trnas). in the case of sufb , rnase t cleavage inhibition assays demonstrated cleavage at g and g a of the g -state trna, but inhibition of cleavage at either position upon treatment with trmd (figure b), indicating that both nucleotides are n -methylated in the m g -state trna. primer extension inhibition assays, which were previously validated by mass spectrometry analysis , showed inhibition of extension at g and g a in m g - and native-state sufb (figure c), confirming that both nucleotides are n -methylated in these species. notably, n methylation shifted almost entirely to g in native-state sufb , indicating that m g is the dominant methylation product in cells. as a control, no inhibition of extension at g or g a was observed for g -state sufb . complementary kinetics experiments showed that the yield and rate of n -methylation of g -state sufb were similar to those of g -state prol (figure d). likewise, kinetics experiments revealed that the yield and rate of aminoacylation of native-state sufb with pro were similar to those of native-state prol (figure e). in contrast, aminoacylation of g -state sufb was inhibited (figure f). these results demonstrate that the native-state sufb synthesized in cells is quantitatively n -methylated to generate m g and is readily aminoacylated with pro. sufb promotes + frameshifting using triplet-slippage and possibly other mechanisms .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we next determined the mechanism(s) through which sufb promotes + frameshifting in a cellular context. we created a pair of isogenic e. coli strains expressing sufb or prol from the chromosome in a trmd-knockdown (trmd-kd) background . this background strain was designed to evaluate the effect of m g on + frameshifting and it was generated by deleting chromosomal trmd and controlling cellular levels of m g using arabinose-induced expression of the human counterpart trm , which is competent to stoichiometrically n -methylate intracellular trna substrates . the isogenic pair of the sufb and prol strains were measured for + frameshifting in a cell-based lacz reporter assay in which a ccc-c motif was inserted into the nd codon position of lacz such that a + -frameshifting event at the motif was necessary to synthesize full-length b-galactosidase (b-gal) . the efficiency of + frameshifting was calculated as the ratio of b-gal expressed in cells containing the ccc-c insertion relative to cells containing an in-frame ccc insertion. in the m g -abundant (m g +) condition, sufb displayed a high + -frameshifting efficiency ( . %, figure a) relative to prol ( . %). in the m g -deficient (m g –) condition, sufb exhibited an even higher efficiency ( . %) and, consistent with our previous work , prol also displayed an increased efficiency ( . %) relative to background ( . %). because n - methylation in the m g + condition was stoichiometric (figure c), thereby preventing quadruplet-pairing, we attribute the . % efficiency of sufb in this condition as arising exclusively from triplet-slippage. in the m g – condition, we observed an increase in + -frameshifting efficiency of sufb to . %. while multiple mechanisms may exist for the increased + frameshifting, the exploration of both triplet-slippage and quadruplet-pairing is one possibility. to confirm our results, we performed similar studies with the isogenic sufb and prol strains on the endogenous e. coli lolb gene, encoding the outer membrane lipoprotein. the lolb gene naturally contains a ccc-c motif at the nd codon position such that + frameshifting at this motif would decrease protein synthesis due to premature termination. as a reference, we used e. coli .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cyss, encoding cysteinyl-trna synthetase (cysrs) , which has no ccc-c motif in the first codons and would be less sensitive to + frameshifting at ccc-c motifs during protein synthesis. the ratio of protein synthesis of lolb to cyss for the control sample prol in the m g condition, measured from western blots (methods), was normalized to . , denoting that lolb and cyss were maximally translated in the -frame without + frameshifting (i.e., a relative + frameshifting efficiency of . ) (figures b, c). in the m g + condition, sufb displayed a ratio of lolb to cysrs of . , indicating an increase in the relative + frameshifting efficiency to . , and in the m g – condition, it displayed a ratio of . , indicating an increase in the relative + frameshifting efficiency to . (figures b, c). similarly, prol in the m g – condition displayed a ratio of lolb to cysrs of . , indicating an increase in the + -frameshifting efficiency to . . sufb can insert non-proteinogenic amino acids at ccc-c motifs we next asked whether sufb can deliver non-proteinogenic amino acids to the ribosome by inducing + frameshifting at a ccc-c motif (figure d). we inserted a ccc-c motif at the th codon position of the e. coli fola gene, encoding dihydrofolate reductase (dhfr). a sufb - induced + frameshifting event at the insertion would result in full-length dhfr, whereas the absence of + frameshifting would result in a c-terminal truncated dhfr fragment (dc). sufb was aminoacylated with non-proteinogenic amino acids using a flexizyme and subsequently tested in [ s]-met-dependent in vitro translation reactions using the e. coli purexpress system. the resulting protein products were separated by sodium dodecyl sulfate (sds)-polyacrylamide gel electrophoresis and quantified by phosphorimaging. control experiments with no sufb or with a non-acylated sufb showed no full-length dhfr, demonstrating that synthesis of full- length dhfr depended upon sufb delivery of an amino acid as a result of + frameshifting at the ccc-c motif. we showed that sufb was able to deliver pro, arg, val, and the pro analogs cis-hydroxypro, trans-hydroxypro, azetidine, and thiapro (supplementary figure ) to the ribosome in response to the ccc-c motif, and that the efficiency of delivery by g -state sufb .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / was generally higher than that by native-state sufb . notably, the purexpress system contains all canonical trnas, including prol and prom, indicating the ability of sufb to successfully compete with these trnas. sufb uses triplet pairing in the -frame at the a site to determine at which step in the elongation cycle sufb undergoes + frameshifting in response to a ccc-c motif, we used an e. coli in vitro translation system composed of purified components and supplemented with requisite trnas and translation factors to perform a series of ensemble rapid kinetic studies. we began with a gtpase assay that reports on the yield and rate with which the translational gtpase ef-tu hydrolyzes gtp upon delivery of a ternary complex (tc), composed of ef-tu, [g- p]-gtp, and prolyl-sufb (sufb -tc) or prol (prol-tc), to the a site of a ribosomal s initiation complex ( s ic) carrying an initiator fmet-trnafmet in the p site and a programmed ccc-c motif at the a site. the results of these experiments showed that the yield and rate of gtp hydrolysis (kgtp,obs) upon delivery of sufb -tc were quantitatively similar to those of prol-tc for both the native- and g -state trnas (figure a). we next performed a dipeptide formation assay that reports on the synthesis of a peptide bond between the [ s]-fmet moiety of a p-site [ s]-fmet-trnafmet in a s ic and the pro moiety of a sufb - or prol-tc delivered to the a site. this assay revealed that the rate of [ s]-fmet-pro (fmp) formation (kfmp,obs) for sufb -tc was within -fold of that for prol-tc for both the native- and g -state trnas (figure b, table s ). to test whether native-state sufb -tc can effectively compete with prol-tc for delivery to the a site and peptide-bond formation, we varied the dipeptide formation assay such that an equimolar mixture of each tc was used in the reaction (figure c). since aminoacylation of both trnas with pro would create dipeptides of the same identity (i.e., fmp), we used a flexizyme to aminoacylate them with different amino acids and generate distinct dipeptides. control experiments showed that prol charged with pro or arg (figure c, bars and ) and sufb .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / charged with pro or arg (bars and ) generated the same amount of fmp and fmr, indicating that the amino-acid identity did not affect the level of dipeptide formation. we found that the amount of dipeptide formed by sufb -tc and prol-tc in these competition assays was similar, although the amount formed by sufb -tc was slightly less ( % vs. %), in both the native- (bars - ) and g -state trnas (supplementary figure a). these competition experiments provide direct evidence that sufb -tc effectively competes with prol-tc for delivery to the a site and peptide-bond formation. collectively, the results of our gtpase-, dipeptide formation-, and competition assays indicate that sufb -tc is delivered to the a site and participates in peptide-bond formation in the same way as prol-tc, suggesting that sufb uses triplet pairing in the -frame at the a site that successfully competes with triplet pairing by prol. to support this interpretation, we measured kfmp,obs in our dipeptide formation assay, using g -state sufb -tc and a series of mrna variants in which single nucleotides in the ccc-c motif were substituted. we showed that kfmp,obs did not decrease upon substitution of the th nucleotide of the ccc-c motif, but that it decreased substantially upon substitution of any of the first three nucleotides of the motif (figure d, supplementary figure b). thus, triplet pairing of sufb to the first three cs of the ccc-c motif is necessary and sufficient for rapid delivery of the trna to the a site and its participation in peptide-bond formation. the a-site activity of sufb depends on the sequence of the anticodon loop we next asked how delivery of sufb -tc to the a site and peptide-bond formation depend on the sequence of the sufb anticodon loop. starting from g -state sufb , we created two variants containing a g-to-c substitution in nucleotide (g c) or (g c) within the anticodon loop and adapted our dipeptide formation assay to measure the fmp yield and kfmp,obs generated by each variant at the ccc-c motif at the a site. we showed that the g c variant resulted in a fmp yield of % and a kfmp,obs of . ± . s– , most likely by triplet pairing of .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / nucleotides - of the anticodon loop with the -frame of the ccc-c motif (figure a). in contrast, the g c variant resulted in a fmp yield of % and a kfmp,obs of . ± . s– , most likely by triplet pairing of nucleotides - of the anticodon loop with the -frame of the ccc-c motif (figure b). our interpretation that nucleotides - of the anticodon loop of the g c variant most likely triplet pair with the -frame of the ccc-c motif is consistent with the observations that the fmp yield and kfmp,obs of the g c variant are similar and -fold higher, respectively, than those of the g c variant. if nucleotides - of the anticodon loop of the g c variant were to form a triplet pair with the ccc-c motif, we would have expected it to pair in the + -frame, which would have most likely reduced the fmp yield and kfmp,obs of the g c variant relative to the g c variant. these results suggest that g -state sufb exhibits some plasticity as to whether it can undergo triplet pairing with anticodon loop nucleotides - or - , consistent with a previous study . sufb shifts to the + -frame during translocation although sufb uses triplet pairing in the -frame when it is delivered to the a site, it is a highly efficient + -frameshifting trna (figure ). we therefore asked whether + frameshifting occurs during or after translocation of sufb into the p site. we addressed this question by adapting our previously developed tripeptide formation assays . we rapidly delivered ef-g and an equimolar mixture of g -state sufb -, trnaval-, and trnaarg-tcs to s ics assembled on an mrna in which the nd codon was a ccc-c motif and the rd codon was either a guu codon encoding val in the + frame or a cgu codon encoding arg in the -frame. as soon as translocation of the pre complex and the associated movement of sufb from the p to a sites formed a ribosomal post-translocation (post) complex with an empty a site in these experiments, trnaval- and trnaarg-tc would compete for the codon at the a site to promote formation of an fmpv tripeptide or an fmpr tripeptide. thus, the fmpv yield and kfmpv,obs report on the sub- population of sufb that shifted to the + -frame, whereas the fmpr yield and kfmpr,obs report on .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the sub-population that remained in the -frame , . the results showed that the yield of fmpv was much higher than that of fmpr ( % vs. %, figure a), demonstrating the high efficiency with which g -state sufb induces + frameshifting. notably, relative to the + frameshifting of prol we have previously reported , kfmpv,obs of sufb ( . s– ) was comparable to the rate of + frameshifting of prol during translocation ( . s– ) rather than that of + frameshifting after translocation into the p site (~ – s– ) , indicating that sufb underwent + frameshifting during translocation. our observation that the fmpv yield plateaus at % at long reaction times suggests that the sub-populations of sufb that will shift to the + -frame and remain in the - frame are likely established in the a site, even before ef-g binds to the pre complex. given that sufb exhibits triplet pairing in the -frame at the a site (figures a-c, supplementary table , and supplementary figure a) and shifts into the + -frame during translocation (figure a), the two sub-populations of sufb in the a site seem to differ primarily in their propensity to undergo + frameshifting during translocation. the sub-population that encompasses % of the total would exhibit a high propensity of undergoing + frameshifting during translocation, whereas the sub-population that encompasses % of the total would exhibit a low propensity of undergoing + frameshifting during translocation, preferring instead to remain in the -frame. we next determined whether the % sub-population of g -state sufb that remained in the -frame during translocation could undergo + frameshifting after arrival at the p site. we varied our tripeptide formation assay so as to deliver the tcs in two steps separated by a defined time interval (figure b). in the first step, g -state sufb -tc and ef-g were delivered to the s ic to form a post complex, which was then allowed the opportunity to shift to the + -frame over a systematically increasing time interval. in the second step, an equimolar mixture of trnaarg- and trnaval-tcs was delivered to the post complex. the results showed that fmpv was rapidly formed at a high yield and exhibited a kfmp+v,obs (where the “+” denotes the time interval between the delivery of translation components) that did not increase as a function of time. in contrast, fmpr was formed at a low yield and exhibited a kfmp+r,obs that did not decrease as a function of .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / time. together, these results indicate that the sub-population of p site-bound sufb in the -frame does not undergo + frameshifting. this interpretation is supported by the observation that ef-p, an elongation factor which we showed suppresses + frameshifting within the p site , had no effect on the yield of fmpv yield (supplementary figure c and supplementary table ). having shown that + frameshifting of sufb occurs only during translocation, we evaluated the effect of m g on the frequency of this event. we began by delivering g -, m g -, or native-state sufb -tcs together with ef-g to s ics to form the corresponding post complexes and then delivered an equimolar mixture of trnaarg- and trnaval-tcs to each post complex to determine the relative formation of fmpv and fmpr. the results showed that m g - and native-state sufb displayed a reduced fmpv yield and a concomitantly increased fmpr yield relative to g -state sufb (figures c, supplementary figures d-f), consistent with the notion that the presence of m g compromises + frameshifting. we then used the same tripeptide formation assay to determine how + frameshifting during translocation of g -state sufb depends on the identity of the th nucleotide of the ccc-c motif. a series of post complexes were generated by delivering g -state sufb -tcs and ef-g to s ics programmed with a ccc-n motif at the nd codon position. each post complex was then rapidly mixed with trnaval-tc to monitor the yield of fmpv and kfmp+v,obs (figure d). the results showed a high fmpv yield and high kfmp+v,obs at the ccc-[c/u] motifs, but a low yield and low kfmp+v,obs at the ccc-[a/g] motifs. this indicates that high-efficiency of sufb -induced + frameshifting during translocation requires the presence of a [c/u] at the th nucleotide of the ccc-c motif. because sufb in these experiments was in the g -state, it is possible that a sub- population underwent + frameshifting via quadruplet-pairing with the [c/u] at the th nucleotide of the ccc-[c/u] motif during translocation. it is also possible that a sub-population underwent + frameshifting via triplet-slippage, which could potentially be inhibited by the presence of [g/a] at the th nucleotide of the motif. to verify that the post complex formed with the ccc-a sequence was largely in the -frame, we rapidly mixed the complex with an equimolar mixture of .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / trnaser-tc, cognate to the next a-site codon in the -frame (agu), and trnaval-tc, cognate to the next a-site codon in the + -frame (guu) (figure e). the results showed a high yield and high kfmp+s,obs, supporting the notion that the post complex formed with the ccc-a motif was largely in the -frame. thus, the th nucleotide of the ccc-c motif plays a role in determining + frameshifting during translocation of sufb from the a site to the p site. the + -frameshifting efficiency of sufb depends on sequences of the anticodon loop and the ccc-c motif to determine whether the + -frameshifting efficiency of sufb during translocation is influenced by sequences of the anticodon loop and the ccc-c motif, we performed tripeptide formation assays and monitored the yield of fmpv. in these experiments, we varied the sequence of the sufb anticodon loop and/or the ccc-c motif at the nd codon position of the mrna. to explore the possibilities of both triplet-slippage and quadruplet-pairing, we used variants of g -state sufb . we showed that variants with the potential to undergo quadruplet-pairing with the ccc-c motif resulted in fmpv yields of % and % (figures c, d). the different yields suggest that g -state sufb variants can induce triplet-slippage and/or engage in quadruplet-pairing with different efficiencies during translocation. analogous experiments showed that sufb variants that were restricted to triplet-pairing resulted in reduced fmpv yields ( % and %, respectively) upon pairing with a ccc-c motif (figures e, f). collectively, these results suggest that there is considerable plasticity in the mechanisms that sufb uses to induce + frameshifting during translocation and in the efficiencies of these mechanisms. an smfret signal that reports on ribosome dynamics during individual elongation cycles to address the mechanism of sufb -induced + frameshifting during translocation, we used a previously developed smfret signal to determine whether and how sufb alters the rates with which the ribosome undergoes a series of conformational changes that drive and regulate the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / elongation cycle (figures a-c). this signal is generated using a ribosomal large, or s, subunit that has been cy - and cy -labeled at ribosomal proteins bl and ul , respectively, to report on ‘opening’ and ‘closing’ of the l stalk of the s subunit. accordingly, individual fret efficiency (efret) vs. time trajectories recorded using this signal exhibit transitions between two fret states corresponding to the ‘open’ (efret = ~ . ) and ‘closed’ (efret = ~ . ) conformations of the l stalk (figure d). previously, we have shown that open→closed and closed→open l stalk transitions correlate with a complex series of conformational changes that take place during an elongation cycle - . the l stalk initially occupies the open conformation as an aa-trna is delivered to the a site of a s ic or post complex and peptide-bond formation generates a pre complex that is in a global conformation we refer to as global state (gs) . the pre complex then undergoes a large- scale structural rearrangement that includes an open→closed transition of the l stalk so as to occupy a second global conformation we refer to as gs (i.e., the . → . efret transition denoted by the rate k s ic→gs in figures d and e, corresponding to the multi-step s ic→gs transition in figure a). subsequently, in the absence of ef-g, the l stalk goes through successive closed→open and open→closed transitions as the pre complex undergoes multiple gs →gs and gs →gs transitions that establish a gs ⇄gs equilibrium (i.e., the . ⇄ . efret transitions denoted by the rates kgs →gs and kgs →gs and the equilibrium constant keq = (kgs →gs )/(kgs →gs ) in figure d, corresponding to the gs ⇄gs transitions in figure a). in the presence of ef-g, however, a single closed→open l stalk transition reports on conformational changes of the pre complex as it undergoes ef-g binding and completes translocation (i.e., the . → . efret transition denoted by the rate kgs →post in figures d and e, corresponding to the multi-step gs →post transition that takes place in the presence of ef-g and bridges across figures a and b). using this approach, we have successfully monitored the conformational .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / dynamics of ribosomal complexes during individual elongation cycles , - , including in a study of – frameshifting . sufb interferes with elongation complex dynamics during late steps in translocation we began by asking whether sufb alters the dynamics of elongation complexes during the earlier steps of the elongation cycle. we stopped-flow delivered sufb - or prol-tc to s ics and recorded pre-steady-state movies during delivery, and steady-state movies min post- delivery (figures a, d, and f, supplementary figures , a, and b). the results showed that k s ic→gs , as well as kgs →gs , kgs →gs , and keq at min post-delivery, for sufb -tc were each less than -fold different than the corresponding value for prol-tc (supplementary table ). the close correspondence of these rates indicates that sufb -tc is delivered to the a site, participates in peptide-bond formation, undergoes gs formation, and exhibits gs →gs and gs →gs transitions within the gs ⇄gs equilibrium in a manner that is similar to prol-tc, consistent with the results of ensemble kinetic assays (figures a-c, supplementary table , and supplementary figure a) and thereby strengthening our interpretation that sufb uses triplet pairing in the -frame at the a site during the early stages of the elongation cycle that precede ef-g binding and ef-g-catalyzed translocation. although we could not confidently detect the presence of two sub-populations of a site-bound sufb in the smfret data that might differ in their propensity of undergoing + frameshifting, as suggested by the results presented in figure a, it is possible that the distance between our smfret probes and/or the time spent in one of the observed fret states are not sensitive enough to detect the structural and/or energetic differences between these sub-populations of a site-bound sufb . the development of different smfret signals and/or the use of variants of sufb and/or the ccc-c motif with different propensities of undergoing + frameshifting may allow future smfret investigations to identify and characterize such sub-populations. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / we then investigated whether sufb alters the dynamics of elongation complexes during the later steps of the elongation cycle. we stopped-flow delivered sufb - or prol-tc and ef-g to s ics and recorded pre-steady-state movies during delivery, and steady-state movies , , , and min post-delivery (figures b, e, and g, supplementary figures c, d, and ). the results showed that k s ic→gs for sufb and prol-tc were within error of each other (supplementary table ), again suggesting that sufb -tc is delivered to the a site, participates in peptide-bond formation, and undergoes gs formation in a manner that is similar to prol-tc. notably, the k s ic→gs s obtained in the presence of ef-g were within error of the ones obtained in the absence of ef-g, consistent with reports that ef-g has little to no effect on the rate with which pre complexes undergo gs →gs transitions , . once it transitions into gs , however, the sufb pre complex can bind ef-g , and we find that it becomes arrested in an ef-g-bound gs -like conformation for up to several minutes, during which it slowly undergoes a gs →post transition (figure g, supplementary figure ). while the limited number of time points did not allow rigorous determination of kgs →post for the sufb pre complex, visual inspection (figure g) and quantitative analysis (supplementary tables and ) showed that the gs →post reaction was complete between and min post- delivery (i.e., kgs →post = ~ . – . s– ). remarkably, this range of kgs →post is up to - orders of magnitude lower than kgs →post measured for the prol pre complex (supplementary table ). it is also up to - orders of magnitude lower than kgs →post for a different pre complex measured using a different smfret signal under the same conditions and the rate of translocation measured using ensemble rapid kinetic approaches under similar conditions , . this observation suggests that sufb adopts a conformation within the ef-g-bound pre complex that significantly impedes conformational rearrangements of the complex that are known to take place during late steps in translocation. these rearrangements include the severing of interactions .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / between the decoding center of the s subunit and the anticodon-codon duplex in the a site - ; forward and reverse swiveling of the ‘head’ domain of the s subunit , associated with opening and closing, respectively, of the ‘e-site gate’ of the s subunit ; reverse relative rotation of the ribosomal subunits , ; and opening of the l stalk , , . collectively, these dynamics facilitate movement of the trna asls and their associated codons from the p and a sites to the e and p sites of the s subunit. we next explored whether sufb alters the dynamics of elongation complexes after it is translocated into the p site. we prepared pre-like complexes carrying deacylated sufb or prol in the p site and a vacant a site (denoted pre–a complexes) and recorded steady-state movies for the resulting gs ⇄gs equilibria (figures c and h, supplementary figure ). the results showed that kgs →gs and kgs →gs for the sufb pre–a complex were % lower and % higher, respectively, than for the prol pre–a complex, driving a . -fold shift towards gs in the gs ⇄gs equilibrium (supplementary table ), suggesting that sufb adopts a conformation at the p site that is different from that of prol. consistent with this interpretation, a recent structural study has shown that the conformation of p site-bound sufa , a + -frameshifting trna with an extra nucleotide in the anticodon loop, is significantly distorted relative to a canonical trna . discussion here we leverage the high efficiency of recoding by sufb to identify the steps of the elongation cycle during which it induces + frameshifting at a quadruplet codon, thus answering the key questions of where, when, and how + frameshifting occurs. we are not aware of any other studies of + frameshifting that have addressed these questions as precisely. in addition to elucidating the determinants of reading-frame maintenance and the mechanisms of sufb - induced + frameshifting, our findings reveal new principles that can be used to engineer genome recoding with higher efficiencies. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / integrating our results with the available structural, biophysical, and biochemical data on the mechanism of translation elongation results in the structure-based model for sufb -induced + frameshifting that we present in figure . in this model, post complexes to which sufb or prol are delivered exhibit virtually indistinguishable conformational dynamics in the early steps of the elongation cycle, up to and including the initial gs →gs transition. however, post complexes to which sufb is delivered exhibit a kgs →post that is more than an order-of-magnitude slower than those to which prol is delivered. notably, kgs →post comprises a series of conformational rearrangements of the ef-g-bound pre complex that facilitate translocation of the trna asls and associated codons within the s subunit. these rearrangements encompass the severing of decoding center interactions with the anticodon-codon duplex in the a site - ; forward and reverse head swiveling , , and associated opening and closing, respectively, of the e-site gate ; reverse relative rotation of the subunits , ; and opening of the l stalk , , (steps pre- g to pre-g , denoted with red arrows, in figure ). given the importance of these rearrangements in translocation of the trna asls and their associated codons within the s subunit, we propose that sufb -mediated perturbation of these rearrangements underlies + frameshifting. more specifically, because sufb does not seem to impede the reverse relative rotation of the subunits or opening of the l stalk during the gs →gs transitions within the gs ⇄gs equilibrium in the absence of ef-g (compare kgs →gs for sufb -tc vs. prol-tc in supplementary table ), it most likely interferes with the severing of decoding center interactions with the anticodon-codon duplex in the a site and/or forward and/or reverse head swiveling and associated opening and/or closing, respectively, of the e-site gate. the latter rearrangement is particularly important for movement of the trna asls and their associated codons within the s subunit - , , suggesting that sufb -mediated perturbation of head swiveling may make the most important contribution to + frameshifting. consistent with this, a recent structural study showed that upon forward head swiveling, the asls of the p- and a-site trnas can disengage from their .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / associated codons and occupy positions similar to a partial + frameshift, even in the presence of a non-frameshift suppressor trna in the a site and the absence of ef-g . while previous structural studies have demonstrated that + frameshifting trnas bind to the a site in the -frame , , and to the p site in the + -frame , these studies lacked ef-g and the observed structures were obtained by directly binding a deacylated + frameshifting trna to the p site. specifically, a + frameshifting peptidyl-trna was not translocated from the a to p sites, as would be the case during an authentic translocation event. in contrast, our elucidation of the + -frameshifting mechanism was executed in the presence of ef-g and is based on extensive comparison of the kinetics with which sufb and prol undergo individual reactions of the elongation cycle (i.e., aa-trna selection, peptide-bond formation, and translocation) and the associated conformational rearrangements of the elongation complex. additionally, all of our in vitro biochemical assays, and most of our ensemble rapid kinetics assays were performed under the conditions in which the a site is always occupied by an aa- or peptidyl-trna, leaving no chance of a vacant a site. therefore, the + frameshifting mechanism we present here is distinct from that presented by farabaugh and co-workers , in which the ribosome is stalled due to a vacant a site, thus giving the + -frameshifting-inducing trna at the p site an opportunity to rearrange into the + -frame. the fact that all well-characterized + -frameshifting trnas contain an extra nucleotide in the anticodon loop, despite differences in their primary sequences, the amino acids they carry, and whether the extra nucleotide is inserted at the '- or '-sides of the anticodon, suggests that the results we report here for sufb are likely applicable to other + - frameshifting trnas with an expanded anticodon loop. while an expanded anticodon loop is a strong feature associated with + frameshifting, it is not associated with – frameshifting, which instead is typically induced by structural barriers in the mrna that stall a translating ribosome from moving forward, thus providing the ribosome with an opportunity to shift backwards in the – direction , . given the unique role of the expanded anticodon loop in + frameshifting, here we have identified the determinants that drive the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ribosome to shift in the + direction. we show that sufb exclusively uses the triplet-slippage mechanism of + frameshifting in the m g + condition, but that it explores other mechanisms (e.g., quadruplet-pairing) in the m g – condition during translocation from the a site to the p site. under conditions that only permit the triplet-slippage mechanism (e.g., in the presence of m g ), sufb exhibits a relatively low + -frameshifting efficiency of ~ %, whereas under conditions that permit quadruplet-pairing during translocation (e.g., in the absence of m g ), it exhibits a relatively high + -frameshifting efficiency of ~ % (figures c-f, a). this feature is observed in various sequence contexts. one advantage of a quadruplet-pairing mechanism during translocation is that it would enhance the thermodynamic stability of anticodon-codon pairing during the large ef-g-catalyzed conformational rearrangements that pre complexes undergo during translocation to form post complexes. nonetheless, sufb is naturally methylated with m g (figure c), indicating that it makes exclusive use of the triplet-slippage mechanism in vivo. this mechanism is likely also exclusively used in vivo by all other + - frameshifting trnas that have evolved from canonical trnas to retain a purine at position , which is almost universally post-transcriptionally modified to block quadruplet-pairing mechanisms. the key insight from this work suggests an entirely novel pathway to increase the efficiency of genome recoding at quadruplet codons. while initial success in genome recoding has been achieved by engineering the anticodon-codon interactions of a + -frameshifting-inducing trna at the a site , , or by engineering a new bacterial genome with a minimal set of codons for all amino acids , we suggest that efforts to engineer the ‘neck’ structural element of the s subunit that regulates head swiveling would be as, or even more, effective. this can be achieved by screening for s subunit variants that exhibit high + -frameshifting efficiencies mediated by + - frameshifting trnas at quadruplet codons while preserving -frame translation by canonical trnas at triplet codons. specifically, head swiveling is driven by the synergistic action of two hinges within the s ribosomal rna elements that comprise the s subunit neck . hinge is .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / composed of two g-u wobble base pairs that are separated by a bulged g within helix (h ), while hinge is composed of a gacu linker between h and h / within a three-helical junction with h . co-engineering these two hinges by directed evolution should identify such s subunit variants. to complement the directed evolution approach, we suggest that our recently developed time-resolved cryogenic electron microscopy (tr cryo-em) method , can be used to obtain structures of sufb and prol in ef-g-bound pre complexes captured in intermediate states of translocation. such cryo-em structures would help further define how the two hinges that control head swiveling are differentially modulated during translocation of sufb vs. prol to provide a structure-based roadmap for engineering them. in addition, detailed comparison of such structures would offer the opportunity to identify ribosomal structural elements beyond the two hinges that play a role in + frameshifting and can thus serve as additional targets for engineering. furthermore, antibiotics that bind to the s subunit and act as translocation modulators can be exploited to further increase the + -frameshifting efficiency at a quadruplet codon with either wildtype or highly efficient s subunit variants. implemented in combination and integrated into a recently described in vivo ‘designer organelle’ strategy , these approaches should provide a novel and powerful platform for increasing the efficiency of genome recoding at quadruplet codons with minimal off-target effects. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / methods construction of e. coli strains. e. coli strains that expressed a plasmid-borne prol or sufb for isolation of native-state trnas were made in a prol-ko strain, which was constructed by inserting the kan-resistance (kan-r) gene, amplified by pcr primers from pkd , into the prol locus of e. coli bl (de ) using the l-red recombination method , followed by removal of the kan-r gene using flp recombination . the pkk -sufb plasmid was made by site-directed mutagenesis to introduce g a into the pkk -prol plasmid . e. coli strains that expressed prol or sufb from the chromosome as an isogenic pair for reporter assays were made using the l-red technique . to construct the e. coli sufb strain, the sufb gene was pcr-amplified from pkk -sufb , and the ' end of the amplified gene was joined with kan-r (from pkd ) by pcr using reverse- primer, while the ' end was homologous to the prol ' flanking region. the pcr- amplified sufb -kan product was used to replace prol in l-red expressing cells. an isogenic counterpart strain expressing prol-kan was also made. these prol-kan and sufb -kan loci were independently transferred to the trmd-ko strain by p transduction, followed by pcp - dependent flp recombination, generating the isogenic pair of prol and sufb strains in the trmd- ko background. these strains were transformed with pkk - -lacz reporter plasmid that has the ccc-c motif at the nd codon position of the lacz gene, and the b-gal activity was measured . all primer sequences used in this work are shown in supplementary table . preparation of translation components for ensemble biochemical experiments. the mrna used for most in vitro translation reactions is shown below, including the shine-dalgarno sequence, the aug start codon, and the ccc-c motif: '-gggaaggagguaaaaaugccccguucuaag(cac) . variants of this mrna had a base substitution in the ccc-c motif. all mrnas were transcribed from double-stranded dna templates with t rna polymerase and purified by gel .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / electrophoresis. e. coli strains over-expressing native-state trnafmet, trnaarg (anticodon icg, where i = inosine), and trnaval (anticodon u*ac, where u* = cmo u) were grown to saturation and were used to isolate total trna. the over-expressed trna species in each total trna sample was aminoacylated by the cognate aminoacyl-trna synthetase and used directly in the tc formation reaction and subsequent tc delivery to s ics or post complexes. e. coli trnaser (anticodon acu) was prepared by in vitro transcription. aminoacyl-trnas with the cognate proteinogenic amino acid were prepared using the respective aminoacyl-trna synthetase and those with a non-proteinogenic amino acid were prepared using the dfx flexizyme and the , - dinitobenzyl ester (dbe) of the respective amino acid (supplementary figure ). aminoacylation and formylation of trnafmet were performed in a one-step reaction in which formyl transferase and the methyl donor -formyltetrahydrofolate were added to the aminoacylation reaction . aminoacyl-trnas were stored in mm sodium acetate (naoac) (ph ) at – °c, as were six- his-tagged e. coli initiation and elongation factors and tight-coupled s ribosomes isolated from e. coli mre cells. recombinant his-tagged e. coli ef-p bearing a b-lysyl-k was expressed and purified from cells co-expressing efp, yjea, and yjek and stored at – °c . preparation of translation components for smfret experiments. s subunits and s subunits lacking ribosomal proteins bl and ul were purified from a previously described bl - ul double deletion e. coli strain , using previously described protocols , , . a previously described single-cysteine variant of bl carrying a gln-to-cys substitution mutation at residue position (bl (q c)) and a previously described single-cysteine variant of ul carrying a thr-to-cys substitution mutation at residue position (ul (t c)) , were purified, labeled with cy - and cy -maleimide, respectively, to generate bl (cy ) and ul (cy ), and reconstituted into the s subunits lacking bl and ul following previously described protocols . the reconstituted bl (cy )- and ul (cy )-labeled s subunits were then re-purified .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / using sucrose density gradient ultracentrifugation , . s subunits lacking bl (cy ) and/or ul (cy ) or harboring unlabeled bl and/or ul do not generate bl (cy )-ul (cy ) smfret signals and therefore do not affect data collection or analysis. previously, we have shown that s ics formed with these bl (cy )- and ul (cy )-containing s subunits can undergo peptide-bond formation and two rounds of translocation elongation with similar efficiency as s ics formed with wild-type s subunits . the sequence of the mrna used for assembling ribosomal complexes for smfret studies is shown below, including the shine-dalgarno sequence, the aug start codon, and the ccc-c motif: '-gcaaccuaaaacucacacagggcccuaaggacauaaaaaugccccguu auccuccugcugcacucgcugcacaaaucgcucaacggcaauuaagga. the mrna was synthesized by in vitro transcription using t rna polymerase, and then hybridized to a previously described ’-biotinylated dna oligonucleotide (supplementary table ) that was complementary to the ' end of the mrna and was chemically synthesized by integrated dna technologies . hybridized mrna:dna-biotin complexes were stored in mm tris-oac (ph = . at ºc), mm edta, and mm kcl at – ºc until they were used in ribosomal complex assembly. aminoacylation and formylation of trnafmet (purchased from mp biomedicals) was achieved simultaneously using e. coli methionyl-trna synthetase and e. coli formylmethionyl-trna formyltransferase . expression and purification of if , if , if , ef-tu, ef-ts, and ef-g were following previously published procedures . preparation and purification of sufb and prol. native-state sufb was isolated from a derivative of e. coli jm lacking the endogenous prol, but expressing sufb from the pkk - plasmid (supplementary table ), while native-state prol was purified from total trna isolated from e. coli jm cells over-expressing prol from the pkk - plasmid. the prol-ko strain lacking the endogenous prol was described previously . each native-state trna was isolated .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / by a biotinylated capture probe attached to streptavidin-derivatized sepharose beads . g -state sufb and prol were also prepared by in vitro transcription. each primary transcript contained a ribozyme domain on the '-side of the trna sequence, which self-cleaved to release the trna. m g -state sufb and prol were prepared by trmd-catalyzed and s-adenosyl methionine (adomet)-dependent methylation of each g -state trna. due to the lability of the aminoacyl linkage to pro, stocks of sufb and prol aminoacylated with pro were either used immediately or stored no longer than - weeks at – °c in mm naoac (ph . ). primer extension inhibition assays. primer extension inhibition analyses of native-, g -, and m g -state sufb and prol were performed as described . a dna primer complementary to the sequence of c to a of sufb and prol was chemically synthesized, p-labeled at the '-end by t polynucleotide kinase, annealed to each trna, and was extended by superscript iii reverse transcriptase (invitrogen) at units/µl with µm each dntp in mm tris-hcl (ph . ), mm mgcl , mm kcl, and mm dtt at °c for min, and terminated by heating at °c for min. extension was quenched with mm edta and products of extension were separated by % denaturing polyacrylamide gel electrophoresis (page/ m urea) and analyzed by phosphorimaging. in these assays, the length of the read-through cdna is - nucleotides, as in the case of the g -state sufb and prol, whereas the length of the primer-extension inhibited cdna products is - nucleotides, as in the case of the m g -state and native-state. rnase t cleavage inhibition assays. rnase t cleaves on the '-side of g, but not m g. cleavage of trnas was performed as previously described . each trna ( µg) was '-end labeled using bacillus stearothermophilus cca-adding enzyme ( nm) with [α- p]atp at °c in mm glycine (ph . ) and mm mgcl . the labeled trna was digested by rnase t (roche, cat # ) at a final concentration of . units/µl for min at °c in mm .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / sodium citrate (ph . ) and mm ethylene diamine tetraacetic acid (edta). the rna fragments generated from cleavage were separated by % page/ m urea along with an rna ladder generated by alkali hydrolysis of the trna of interest. cleavage was analyzed by phosphorimaging. methylation assays. pre-steady-state assays under single-turnover conditions were performed on a rapid quench-flow apparatus (kintek rqf- ). the trna substrate was heated to °c for . min followed by addition of mm mgcl , and slowly cooled to °c in min. n -methylation of g in the pre-annealed trna (final concentration µm) was initiated with the addition of e. coli trmd ( µm) and [ h]-adomet (perkin elmer, dpm/pmol) at a final concentration of µm in a buffer containing mm tris-hcl (ph . ), mm nh cl, mm mgcl , mm dtt, . mm edta, and . mg/ml bsa in a reaction of µl. the buffer used was optimized for trmd in order to evaluate its in vitro activity . reaction aliquots of µl were removed at various time points and precipitated in % (w/v) trichloroacetic acid (tca) on filter pads for min twice. filter pads were washed with % ethanol twice, with ether once, air dried, and measured for radioactivity in an ls scintillation counter (beckman). counts were converted to pmoles using the specific activity of the [ h]-adomet after correcting for the signal quenching by filter pads. in these assays, a negative control was always included, in which no enzyme was added to the reaction , and signal from the negative control was subtracted from signal of each sample for determining the level of methylation. aminoacylation assays. each sufb or prol trna was aminoacylated with pro by a recombinant e. coli prors expressed from the plasmid pet and purified from e. coli bl (de ) . each trna was heat-denatured at ºc for min, and re-annealed at ºc for min. aminoacylation under pre-steady state conditions was performed at ºc with µm trna, .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / µm prors, and µm [ h]-pro (perkin elmer, . ci/mmol) in a buffer containing mm kcl, mm mgcl , mm dithiothreitol (dtt), . mg/ml bovine serum albumin (bsa), mm atp (ph . ), and mm tris-hcl (ph . ) in a reaction of µl. reaction aliquots of µl were removed at different time intervals and precipitated with % (w/v) tca on filter pads for min twice. filter pads were washed with % ethanol twice, with ether once, air dried, and measured for radioactivity in an ls scintillation counter (beckman). counts were converted to pmoles using the specific activity of the [ h]-pro after correcting for signal quenching by filter pads. cell-based + -frameshifting reporter assays. isogenic e. coli strains expressing chromosomal copies of sufb or prol were created in a previously developed trmd-knockdown (trmd-kd) background, in which the chromosomal trmd is deleted but cell viability is maintained through the arabinose-induced expression of a plasmid-borne trm , the human counterpart of trmd , that is competent for m g synthesis to support bacterial growth (supplementary table ). due to the essentiality of trmd for cell growth, a simple knock-out cannot be made. we chose human trm as the maintenance protein in the trmd-kd background, because this enzyme is rapidly degraded in e. coli once its expression is turned off to allow immediate arrest of m g synthesis. in the isogenic sufb and prol strains, the level of m g is determined by the concentration of the added arabinose in a cellular context that expresses prom as the only competing trnapro species. in the m g + condition, where arabinose was added to . % in the medium, trna substrates of n -methylation were confirmed to be % methylated by mass spectrometry, whereas in the m g – condition, where arabinose was not added to the medium, trna substrates of n - methylation were confirmed to be % methylated by mass spectrometry . each strain was transformed with the pkk - plasmid expressing an mrna with a ccc-c motif at the nd codon position of the reporter lacz gene. to simplify the interpretation, the natural aug codon at the th position of lacz was removed. a + frameshift at the ccc-c motif would enable expression of .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / lacz. the activity of b-gal was directly measured from lysates of cells grown in the presence or absence of . % arabinose to induce or not induce, respectively, the plasmid-borne human trm . in these assays, decoding of the ccc-c codon motif would be mediated by sufb and prom in the sufb strain, and would be mediated by prol and prom in the prol strain. due to the presence of prom in both strains, there would be no vacancy at the ccc-c codon motif. cell-based + frameshifting lolb assays. to quantify the + -frameshifting efficiency at the ccc-c motif at the nd codon position of the natural lolb gene, the ratio of protein synthesis of lolb to cyss was measured by western blots. overnight cultures of the isogenic strains expressing sufb or prol were separately inoculated into fresh lb media in the presence or absence of . % arabinose and were grown for h to produce the m g + and m g – conditions, respectively. cultures were diluted - to -fold into fresh media to an optical density (od) of ~ . and grown for another h. cells were harvested and µg of total protein from cell lysates was separated on % sds-page and probed with rabbit polyclonal primary antibodies against lolb (at a , dilution) and against cysrs (at a , dilution), followed by goat polyclonal anti-rabbit igg secondary antibody (sigma-aldrich, #a ). the ratio of protein synthesis of lolb to cyss was quantified using super signal west pico chemiluminescent substrate (thermo fischer) in a chemi-doc xr imager (bio-rad) and analyzed by image lab software (bio-rad, soft-lit- - -ilspc-v- - ). to measure the + -frameshifting efficiency, we measured the ratio of protein synthesis of lolb to cyss for each trna in each condition, and we normalized the observed ratio in the control sample (i.e., prol in the m g + condition) to . , indicating that protein synthesis of these two genes was in the -frame and no + frameshifting. a decrease of this ratio was interpreted as a proxy of + frameshifting at the ccc-c motif at the nd codon position of lolb. from the observed ratio of each sample in each condition, we calculated the + frameshifting efficiency relative to the control sample. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cell-free purexpress in vitro translation assays. the fola gene, provided as part of the e. coli purexpress (new england biolabs) in vitro translation system, was modified by site-directed mutagenesis to introduce a ccc-c motif into the th codon position. if sufb induced + frameshifting at this motif, a full-length dhfr would be made, whereas if sufb failed to do so, a c-terminal truncated fragment (dc) would be made due to premature termination of protein synthesis. because sufb has no orthogonal trna synthetase for aminoacylation with a non- proteinogenic amino acid, we used the flexizyme ribozyme technology for this purpose. coupled in vitro transcription-translation of the modified e. coli fola gene containing the ccc-c motif at the th codon position was conducted in the presence of [ s]-met using the purexpress system. sds-page analysis was used to detect [ s]-met-labeled polypeptides, which included the full-length dhfr, the dc fragment, and a dn fragment that likely resulted from initiation of translation at a cryptic site downstream from the ccc-c motif (figure d). the fraction of the full- length fola gene product, the dc fragment, and the dn fragment was calculated from the amount of each in the sum of all three products. we attribute the overall low recoding efficiency ( . – . %) as arising from a combination of the rapid hydrolysis of the prolyl linkage, which is the least stable among aminoacyl linkages , and the lack of sufb re-acylation in the purexpress system. in these assays, each trna was tested in the g -state and each was normalized by the flexizyme aminoacylation efficiency, which was ~ % for pro and pro analogues. the purexpress contained all natural e. coli trnas, such that the ccc-c codon motif would not have a chance of vacancy even when a specific ccc-reading trna was absent. rapid kinetic gtpase assays. ensemble gtpase assays were performed using the codon-walk approach, in which an e. coli in vitro translation system composed of purified components is supplemented with the requisite trnas and translation factors to interrogate individual steps of the elongation cycle. programmed with a previously validated synthetic aug-ccc-cgu-u mrna .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / template , , a s ic was assembled that positioned the aug start codon and an initiator fmet- trnafmet at the p site and the ccc-c motif at the a site. reactions to monitor the ef-tu- dependent hydrolysis of gtp during delivery and accommodation of a tc to the a site were conducted at °c in a buffer containing mm tris-hcl (ph . ), mm nh cl, mm kcl, mm mgcl , mm dtt, and . mm spermidine . each tc was formed by incubating ef-tu with nm [g- p]-gtp ( ci/mmole) for min at ºc, after which aminoacylated sufb or prol was added and the incubation continued for min at ºc. unbound [g- p]-gtp was removed from the tc solution by gel filtration through a spin cartridge (centrispin- ; princeton separations). equal volumes of each purified tc and a solution of s ics were rapidly mixed in the rqf- kintek chemical quench apparatus . final concentrations in these reactions were . µm for the s ic; . µm for mrna; . µm each for ifs , , and ; . µm for fmet-trnamet; . µm for ef-tu; . µm for aminoacylated sufb or prol; and . mm for cold gtp. the yield of gtp hydrolysis and kgtp,obs upon rapid mixing of each tc with excess s ics were measured by removing aliquots of the reaction at defined time points, quenching the aliquots with % formic acid, separating [g- p] from [g- p]-gtp using thin layer chromatography (tlc), and quantifying the amount of each as a function of time using phosphorimaging . we adjusted reaction conditions such that the kgtp,obs increased linearly as a function of s ic concentration. rapid kinetic di- and tripeptide formation assays. di- and tripeptide formation assays were performed using the codon-walk approach described above in mm tris-hcl (ph . ), mm nh cl, mm kcl, . mm mgcl , mm dtt, . mm spermidine, at °c unless otherwise indicated . s ics were formed by incubating s ribosomes, mrna, [ s]-fmet-trnafmet, and ifs , , and , and gtp, for min at °c in the reaction buffer. separately, tcs were formed in the reaction buffer by incubating ef-tu and gtp for min at °c followed by adding the requisite aa-trnas and incubating in an ice bath for min. in dipeptide formation assays, s .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ics templated with the specified variants of an aug-nnn-ngu-u mrna were mixed with sufb - tc or prol-tc. fmp formation was monitored in an rqf- kintek chemical quench apparatus. in tripeptide formation assays, s ics templated with the specified variants of the aug-ncc-ngu- u mrna were mixed, either in one step or in two steps, with equimolar mixtures of sufb -, trnaval (anticodon u*ac, where u* = cmo u)-, and trnaarg (anticodon icg, where i = inosine)-tcs and ef-g. formation of fmpv and fmpr were monitored in an rqf- kintek chemical quench apparatus. tripeptide formation assays with one-step delivery of tcs were initiated by rapidly mixing the s ic with two or more of the tcs in the rqf- kintek chemical quench apparatus. final concentrations in these reactions were . µm for the s ic; . µm for mrna; . µm each for ifs , , and ; . µm for [ s]-fmet-trnafmet; . µm for ef-g; . µm for ef-tu for each aa-trna; . µm each for the aa-trnas; and mm for gtp. for tripeptide formation assays with one-step delivery of g -state sufb -, trnaval-, and trnaarg-tcs to the s ics, the yield of fmpv and kfmpv,obs report on the activity of ribosomes that shifted to the + -frame, whereas the yield of fmpr and kfmpr,obs report on the activity of ribosomes that remained in the -frame , . we chose g -state sufb to maximize its + -frameshifting efficiency but native-state trnaval and trnaarg to prevent them from undergoing unwanted frameshifting (note that, for simplicity, we have not denoted the aminoacyl or dipeptidyl moieties of the trnas). tripeptide formation assays with two-step delivery of tcs were performed in a manner similar to those with one-step delivery of tcs, except that the s ics were incubated with a sufb - or prol-tc and . µm ef-g for . - min, as specified, followed by manual addition of an equimolar mixture of trnaarg- and trnaval-tcs. reactions were conducted at °c unless otherwise specified, and were quenched by adding concentrated koh to . m. after a brief incubation at °c, aliquots of . µl were spotted onto a cellulose-backed plastic tlc sheet and electrophoresed at v in pyrac buffer ( mm pyridine, . m acetic acid, ph . ) until the marker dye bromophenol blue reached the water-oil interface at the anode . the position of the origin was adjusted to maximize separation of the expected oligopeptide products. the separation of unreacted [ s]- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fmet and each of the [ s]-fmet-peptide products was visualized by phosphorimaging and quantified using imagequant (ge healthcare) and kinetic plots were fitted using kaleidagraph (synergy software). assembly and purification of s ics, tcs, post, and pre–a complexes for use in smfret experiments. s ics were assembled in a manner analogous to those for the ensemble rapid kinetic studies described above, except that the mrna containing an aug-ccc-cgu-u coding sequence was '-biotinylated and the s subunits were labeled with bl (cy ) and ul (cy ). more specifically, s ics were assembled in three steps. first, pmol of s subunits, pmol of if , pmol of if , pmol of if , nmol of gtp, and pmol of biotin-mrna in µl of tris-polymix buffer ( mm tris-(hydroxymethyl)-aminomethane acetate (tris-oac) (ph °c = . ), mm kcl, mm nh oac, . mm ca(oac) , . mm edta, mm -mercaptoethanol (bme), mm putrescine dihydrochloride, and mm spermidine (free base)) at mm mg(oac) were incubated for min at ºc. then pmol of fmet-trnafmet in µl of mm koac (ph = ) was added to the reaction, followed by an additional incubation of min at ºc. finally, pmol of bl (cy )- and ul (cy )-labeled s subunits in µl of reconstitution buffer ( mm tris-hcl (ph °c = . ), mm mg(oac) , mm nh cl, . mm edta, and mm bme) was added to the reaction to give a final volume of µl, followed by a final incubation of min at ºc. the reaction was then adjusted to µl with tris-polymix buffer at mm mg(oac) , loaded onto a - % (w/v) sucrose gradient prepared in tris-polymix buffer at mm mg(oac) , and purified by sucrose density gradient ultracentrifugation to remove any free mrna, ifs, and fmet-trnafmet. purified s ics were aliquoted, flash frozen in liquid nitrogen, and stored at – ºc until use in smfret experiments. tcs were prepared in two steps. first, pmol of ef-tu and pmol of ef-ts in µl of tris-polymix buffer at mm mg(oac) supplemented with gtp charging components ( mm gtp, mm phosphoenolpyruvate, and units/ml pyruvate kinase) were incubated for min at .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ºc. then, pmol of aa-trna in µl of mm naoac (ph = ) was added to the reaction, followed by an additional incubation of min at ºc. this results in a tc solution with a final volume of µl that was then stored on ice until used for smfret experiments. to prepare pre–a complexes, we first needed to assemble post complexes. post complexes were assembled by first preparing a -µl solution of s ic and a -µl solution of tc as described above. separately, a solution of gtp-bound ef-g was prepared by incubating pmol ef-g in µl of tris-polymix buffer at mm mg(oac) supplemented with gtp charging components for min at room temperature. then µl of the s ic, µl of the tc, and . µl the gtp-bound ef-g solution were mixed, and incubated for min at room temperature and for additional min on ice. the resulting post complex was diluted by adjusting the reaction volume to µl with tris-polymix buffer at mm mg(oac) and purified via sucrose density gradient ultracentrifugation as described above for the s ics. purified post complexes were aliquoted, flash frozen in liquid nitrogen, and stored at – ºc until use in smfret experiments. pre–a complexes were then generated by mixing µl of post complex, µl of a mm puromycin solution (prepared in nanopure water and filtered using a . µm filter), and µl of tris-polymix buffer at mm mg(oac) and incubating the mixture for min at room temperature. pre–a complexes were used for smfret experiments immediately upon preparation. smfret imaging using total internal reflection fluorescence (tirf) microscopy. s ics or pre–a complexes were tethered to the peg/biotin-peg-passivated and streptavidin-derivatized surface of a quartz microfluidic flowcell via a biotin-streptavidin-biotin bridge between the biotin- mrna and the biotin-peg , . untethered s ics or pre–a complexes were removed from the flowcell, and the flowcell was prepared for smfret imaging experiments, by flushing it with tris- polymix buffer at mm mg(oac) supplemented with an oxygen-scavenging system ( . mm protocatechuic acid (ph = ) (sigma aldrich) and nm protocatechuate- , -dioxygenase (ph .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / = . ) (sigma aldrich)) and a triplet-state-quencher cocktail ( mm , , , -cyclooctatetraene (aldrich) and mm -nitrobenzyl alcohol (fluka)) . tethered s ics or pre–a complexes were imaged at single-molecule resolution using a laboratory-built, wide-field, prism-based total internal reflection fluorescence (tirf) microscope with a -nm, diode-pumped, solid-state laser (laser quantum) excitation source delivering a power of - mw as measured at the prism to ensure the same power density on the imaging plane. the cy and cy fluorescence emissions were simultaneously collected by a . numerical aperture, ´, water-immersion objective (nikon) and separated based on wavelength using a two-channel, simultaneous-imaging system (dual viewtm, optical insights llc). the cy and cy fluorescence intensities were recorded using a ´ pixel, back-illuminated electron-multiplying charge-coupled-device (emccd) camera (andor ixon ultra ) operating with ´ pixel binning at an acquisition time of . seconds per frame controlled by software μmanager . . this microscope allows direct visualization of thousands of individual s ics or pre-a complexes in a field-of-view of × µm . each movie was composed of frames in order to ensure that the majority of the fluorophores in the field-of-view were photobleached within the observation period. for stopped-flow experiments using tethered s ics, we delivered . µm of g -state sufb - or prol-tc in the absence of ef-g or, when specified, in the presence of a µm saturating concentration of ef-g. stopped-flow experiments proceeded by recording an initial pre-steady-state movie of a field-of-view that captured conformational changes taking place during delivery followed by recording of one or more steady-state movies of different fields-of-view that captured conformational changes taking place the specified number of minutes post-delivery. analysis of smfret experiments. for each tirf microscopy movie, we identified fluorophores, aligned cy and cy imaging channels, and generated fluorescence intensity vs. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / time trajectories for each pair of cy and cy fluorophores using custom-written software (manuscript in preparation; jason hon, colin kinz-thompson, ruben l. gonzalez) as described previously . for each time point, cy fluorescence intensity values were corrected for cy bleedthrough by subtracting % of the cy fluorescence intensity value in the corresponding cy fluorescence intensity vs. time trajectory. efret vs. time trajectories were generated by using the cy fluorescence intensity (icy ) and the bleedthrough-corrected cy fluorescence intensity (icy ) from each aligned pair of cy and cy fluorophores to calculate the efret value at each time point using efret = (icy / (icy + icy )). for both pre-steady-state and steady-state movies (figures d- h and supplementary figures , , and , supplementary tables - ), an efret vs. time trajectory was selected for further analysis if all of the transitions in the fluorescence intensity vs. time trajectory were anti- correlated for the corresponding, aligned pair of cy and cy fluorophores, and the cy fluorescence intensity vs. time trajectory underwent single-step cy photobleaching, demonstrating it arose from a single ribosomal complex. in the case of pre-steady-state movies (figures d- g, supplementary figures and and tables - ), efret vs. time trajectories had to meet two additional criteria in order to be selected for further analysis: (i) efret vs. time trajectories had to be stably sampling efret = . prior to tc delivery, thereby confirming that the corresponding ribosomal complex was a s ic carrying an fmet-trnafmet at the p site and (ii) efret vs. time trajectories had to exhibit at least one . → . transition after delivery of tcs, thereby confirming that the corresponding s ic had accommodated a pro-sufb or pro-prol into the a site, that the a site-bound pro-sufb or pro-prol had participated as the acceptor in peptide-bond formation, and that the resulting pre complex was capable of undergoing gs →gs transitions. we note that the second criterion might result in the exclusion of efret vs. time trajectories in which cy or cy simply photobleached prior to undergoing a . → . transition, and could therefore result in a slight overestimation of k s ic→gs and/or kgs →gs (see below for a detailed description of how k s ic→gs , kgs →gs , and other kinetic and thermodynamic .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / parameters were estimated). nonetheless, the number of such efret vs. time trajectories should be exceedingly small. this is because the rates with which the fluorophore that photobleached the fastest, cy , entered into the photobleached state (Æ) from the gs , gs , ef-g-bound gs - like, and post states were kgs →Æ = . ± . s– , kgs →Æ = . ± . s– , kgs (g)→Æ = . ± . s– (where the subscript “(g)” denotes experiments performed in the presence of ef-g), and kpost→Æ . ± . s– , respectively (see below for a detailed description of how kgs →Æ, kgs →Æ, kgs (g)→Æ, and kpost→Æ were estimated). these rates are, on average, about -fold lower than those of k s ic → gs and kgs → gs ( . – . s– and . – . s– (supplementary table )). consequently, we do not expect the measurements of k s ic→gs and kgs →gs to be limited by cy or cy photobleaching. additionally, even if k s ic→gs and kgs →gs were slightly overestimated, they would be expected to be equally overestimated for sufb - and prol ribosomal complexes given that the rate of photobleaching would be expected to be very similar for sufb - and prol ribosomal complexes. furthermore, because we are primarily concerned with the relative values of k s ic→gs and kgs →gs for sufb - vs. prol ribosomal complexes, rather than with the absolute values of k s ic→gs and kgs →gs for the sufb - and prol ribosomal complexes, such slight overestimations do not affect the conclusions of the work presented here. to calculate k s ic→gs and the corresponding error from the pre-steady-state experiments, we analyzed the s ic survival probabilities (supplementary figure , tables and ) , . briefly, for each trajectory, we extracted the time interval during which we were waiting for the s ic to undergo a transition to gs and used these ‘waiting times’ to construct a s ic survival probability distribution, as shown in supplementary figure . all s ic survival probability distributions were best described by a single exponential decay function of the type 𝑌 = 𝐴e("#/𝜏!"# %&) , ( ) where y is survival probability, a is the initial population of s ic, t is time, and τ s ic is the time constant with which s ic transitions to a pre complex in the gs state. k s ic→gs was .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / then calculated using the equation k s ic→gs = / τ s ic. errors were calculated as the standard deviation of technical triplicates. six sets of kinetic and/or thermodynamic parameters were calculated from hidden markov model (hmm) analyses of the recorded movies. these parameters are defined here as: (i) kgs →gs , kgs →gs , and keq from the pre-steady-state and steady-state movies recorded for the delivery of sufb - and prol-tcs in the absence of ef-g (figures d, f, and supplementary figure and table ); (ii) kgs →post from the pre-steady-state movie recorded for the delivery of prol-tc in the presence of ef-g (figures e, g, and supplementary figure and table ); (iii) the fractional population of the post complex from the pre-steady-state and steady-state movies recorded for the delivery of sufb - and prol-tcs in the presence of ef-g (figures e, g, and supplementary figure and table ); (iv) kgs →gs , kgs →gs , and keq from a sub-population of pre complexes that lacked an a site-bound, deacylated sufb in the steady-state movies recorded for the longer time points (i.e., , , and min) after the delivery of sufb -tc in the presence of ef-g (figures g, supplementary table ); (v) kgs →gs , kgs →gs , and keq from the steady-state movies recorded for the sufb - and prol pre–a complexes (figures h and supplementary figure and table ); and (vi) kgs →Æ, kgs →Æ, kgs (g)→Æ, and kpost→Æ from the movies described in (i)-(v) (figures d- h, supplementary figures , , and , and reported two paragraphs above). to calculate these parameters, we extended the variational bayes approach we introduced in the vbfret algorithm to estimate a ‘consensus’ (i.e., ‘global’) hmm of the efret vs. time trajectories. in this approach, we use bayesian inference to estimate a single, consensus hmm that is most consistent with all the efret vs. time trajectories in a movie, rather than to estimate a separate hmm for each trajectory in the movie. to estimate such a consensus hmm, we assume each trajectory is independent and identically distributed, thereby enabling us to perform the inference using the likelihood function ℒ = ∏ ℒ& & ∈ )*+,-.)/* - , ( ) .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / where ℒ& is the variational approximation of the likelihood function for a single trajectory. subsequently, the single, consensus hmm that is most consistent with all of the trajectories is estimated using the expectation-maximization algorithm that we have previously described . viterbi paths (supplementary figures , , and ), representing the most probable hidden-state trajectory, were then calculated from the hmm using the viterbi algorithm . based on extensive smfret studies of translation elongation using the bl (cy )-ul (cy ) smfret signal , , , we selected a consensus hmm composed of three states for further analysis of the data. for calculation of the kinetic and/or thermodynamic parameters in (i), (iv), and (v), the three states corresponded to gs , gs , and Æ and for calculation of the kinetic and/or thermodynamic parameters in (ii) and (iii), the three states corresponded to ef-g-bound gs -like, post, and Æ. the transition matrix of the consensus hmm was then used to calculate kgs →gs and kgs →gs in (i), (iv), and (v); kgs →post in (ii); kgs →Æ, kgs →Æ, kgs (g)→Æ, and kpost→Æ in (vi); and the errors corresponding to each of these parameters. this transition matrix consists of a x matrix in which the off-diagonal elements correspond to the number of times a transition takes place between each pair of the gs , gs , and Æ states (in (i), (iv), (v), and (vi)) or each pair of the ef- g-bound gs -like, post, and Æ states (in (ii) and (vi)) and the on-diagonal elements correspond to the number of times a transition does not take place out of the gs , gs , and Æ states (in (i), (iv), (v), and (vi)) or out of the ef-g-bound gs -like, post, and Æ states (in (ii) and (vi)). each element of this matrix parameterizes a dirichlet distribution, from which we calculated the mean and the square root of the variance for four transition probabilities pgs →gs , pgs →gs , pgs →Æ, and pgs →Æ (in (i), (iv), (v), and (vi)) or for three transition probabilities pgs →post, pgs (g)→Æ, and ppost→ Æ (in (ii) and (vi)). these transition probabilities were then used to calculate the corresponding four rate constants, kgs →gs , kgs →gs , kgs →Æ, and kgs →Æ (in (i), (iv), (v), and (vi)) or three rate constants, kgs →post, kgs (g)→Æ, and kpost→Æ (in (ii) and (vi)) using the equation .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / 𝑘 = − ln( − 𝑝) 𝑡 , ( ) where t is the time interval between data points (t = . s). we propagated the error for the transition probabilities into the error for the rate constants using the equation 𝜎 = 𝜎 ( − 𝑝) × 𝑡 , ( ) where 𝜎 is the standard deviation of the variance of p and 𝜎 is the standard deviation of the variance of k. keq in (i), (iv), and (v) was determined using the equation keq = kgs →gs / kgs →gs . the fractional populations of the post complex in (iii) and the corresponding errors were calculated by marginalizing, which in this case simply amounts to calculating the mean and the standard error of the mean, for the conditional probabilities of each efret data point given each hidden state. because the data points preceding the initial s ic→gs transition in the pre- steady-state movies do not contribute to the kinetic and/or thermodynamic parameters in (i)-(vi), these data points were not included in the analyses that were used to determine these thermodynamic parameters. quantification and statistical analyses all ensemble biochemical experiments and cell-based reporter assays were repeated at least three times and the mean values and standard deviations for each experiment or assay are reported. technical replicates of all smfret experiments were repeated at least three times and trajectories from all of the technical replicates for each experiment were combined prior to generating the surface contour plot of the time evolution of population fret and modeling with the hmm. mean values and errors for the transition rates and fractional populations determined from modeling with an hmm are reported (for details see “analysis of smfret experiments” in methods). mean values and standard deviations for the k s ic→gs s were determined from technical triplicates of the survival plots analysis for each experiment and are reported. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / data and code availability data availability with the exception of the smfret data, all other data supporting the findings of this study are presented within this article. due to the lack of a public repository for smfret data, the smfret data supporting the findings of this study are available from the corresponding authors upon request. source data are provided with this paper. code availability the code used to analyze the tirf movies in this study is described in a manuscript in preparation (jason hon, colin kinz-thompson, ruben l. gonzalez), where r.l.g. is the corresponding author. therefore, the code is available from r.l.g, upon request. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . wang, k., schmied, w.h. & chin, j.w. reprogramming the genetic code: from triplet to quadruplet codes. angew chem int ed engl , - ( ). . chen, y. et al. controlling the replication of a genomically recoded hiv- with a functional quadruplet codon in mammalian cells. acs synth biol , - ( ). . lee, b.s., kim, s., ko, b.j. & yoo, t.h. an efficient system for incorporation of unnatural amino acids in response to the four-base codon agga in escherichia coli. biochim biophys acta , - ( ). . chatterjee, a., lajoie, m.j., xiao, h., church, g.m. & schultz, p.g. a bacterial strain with a unique quadruplet codon specifying non-native amino acids. chembiochem , - ( ). . niu, w., schultz, p.g. & guo, j. an expanded genetic code in mammalian cells with a functional quadruplet codon. acs chem biol , - ( ). . wang, n., shang, x., cerny, r., niu, w. & guo, j. systematic evolution and study of uagn decoding trnas in a genomically recoded bacteria. sci rep , ( ). . neumann, h., wang, k., davis, l., garcia-alai, m. & chin, j.w. encoding multiple unnatural amino acids via evolution of a quadruplet-decoding ribosome. nature , - ( ). . wang, k. et al. optimized orthogonal translation of unnatural amino acids enables spontaneous protein double-labelling and fret. nat chem , - ( ). . atkins, j.f., loughran, g., bhatt, p.r., firth, a.e. & baranov, p.v. ribosomal frameshifting and transcriptional slippage: from genetic steganography and cryptography to adventitious use. nucleic acids res , - ( ). . atkins, j.f. & bjork, g.r. a gripping tale of ribosomal frameshifting: extragenic suppressors of frameshift mutations spotlight p-site realignment. microbiol mol biol rev , - ( ). . roth, j.r. frameshift suppression. cell , - ( ). . bossi, l. & roth, j.r. four-base codons acca, accu and accc are recognized by frameshift suppressor sufj. cell , - ( ). . qian, q. et al. a new model for phenotypic suppression of frameshift mutations by mutant trnas. mol cell , - ( ). . weiss, r.b., dunn, d.m., shuh, m., atkins, j.f. & gesteland, r.f. e. coli ribosomes re- phase on retroviral frameshift signals at rates ranging from to percent. new biol , - ( ). . jager, g., nilsson, k. & bjork, g.r. the phenotype of many independently isolated + frameshift suppressor mutants supports a pivotal role of the p-site in reading frame maintenance. plos one , e ( ). . fagan, c.e., maehigashi, t., dunkle, j.a., miles, s.j. & dunham, c.m. structural insights into translational recoding by frameshift suppressor trnasufj. rna , - ( ). . maehigashi, t., dunkle, j.a., miles, s.j. & dunham, c.m. structural insights into + frameshifting promoted by expanded or modification-deficient anticodon stem loops. proc natl acad sci u s a , - ( ). . dunham, c.m. et al. structures of trnas with an expanded anticodon loop in the decoding center of the s ribosomal subunit. rna , - ( ). . hong, s. et al. mechanism of trna-mediated + ribosomal frameshifting. proc natl acad sci u s a , - ( ). . sroga, g.e., nemoto, f., kuchino, y. & bjork, g.r. insertion (sufb) in the anticodon loop or base substitution (sufc) in the anticodon stem of trna(pro) from salmonella .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / typhimurium induces suppression of frameshift mutations. nucleic acids res , - ( ). . caliskan, n., katunin, v.i., belardinelli, r., peske, f. & rodnina, m.v. programmed - frameshifting by kinetic partitioning during impeded translocation. cell , - ( ). . taylor, d.j. et al. structures of modified eef s ribosome complexes reveal the role of gtp hydrolysis in translocation. embo j , - ( ). . khade, p.k. & joseph, s. messenger rna interactions in the decoding center control the rate of translocation. nat struct mol biol , - ( ). . liu, g. et al. ef-g catalyzes trna translocation by disrupting interactions between decoding center and codon-anticodon duplex. nat struct mol biol , - ( ). . abeyrathne, p.d., koh, c.s., grant, t., grigorieff, n. & korostelev, a.a. ensemble cryo- em uncovers inchworm-like translocation of a viral ires through the ribosome. elife , doi: . /elife. ( ). . schuwirth, b.s. et al. structures of the bacterial ribosome at . a resolution. science , - ( ). . pulk, a. & cate, j.h. control of ribosomal subunit rotation by elongation factor g. science , ( ). . ratje, a.h. et al. head swivel on the ribosome facilitates translocation by means of intra- subunit trna hybrid sites. nature , - ( ). . gamper, h.b., masuda, i., frenkel-morgenstern, m. & hou, y.m. maintenance of protein synthesis reading frame by ef-p and m( )g -trna. nat commun , ( ). . masuda, i. et al. trna methylation is a global determinant of bacterial multi-drug resistance. cell syst , - e ( ). . christian, t. & hou, y.m. distinct determinants of trna recognition by the trmd and trm methyl transferases. j mol biol , - ( ). . murakami, h., ohta, a., ashigai, h. & suga, h. a highly flexible trna acylation method for non-natural polypeptide synthesis. nat methods , - ( ). . walker, s.e. & fredrick, k. recognition and positioning of mrna in the ribosome by trnas with expanded anticodons. j mol biol , - ( ). . gamper, h.b., masuda, i., frenkel-morgenstern, m. & hou, y.m. the ugg isoacceptor of trnapro is naturally prone to frameshifts. int j mol sci , - ( ). . fei, j. et al. allosteric collaboration between elongation factor g and the ribosomal l stalk directs trna movements during translation. proc natl acad sci u s a , - ( ). . ning, w., fei, j. & gonzalez, r.l., jr. the ribosome uses cooperative conformational changes to maximize and regulate the efficiency of translation. proc natl acad sci u s a , - ( ). . fei, j., kosuri, p., macdougall, d.d. & gonzalez, r.l., jr. coupling of ribosomal l stalk and trna dynamics during translation elongation. mol cell , - ( ). . fei, j., richard, a.c., bronson, j.e. & gonzalez, r.l., jr. transfer rna-mediated regulation of ribosome dynamics during protein synthesis. nat struct mol biol , - ( ). . boel, g. et al. the abc-f protein etta gates ribosome entry into the translation elongation cycle. nat struct mol biol , - ( ). . chen, b. et al. etta regulates translation by binding the ribosomal e site and restricting ribosome-trna dynamics. nat struct mol biol , - ( ). . kim, h.k. et al. a frameshifting stimulatory stem loop destabilizes the hybrid state and impedes ribosomal translocation. proc natl acad sci u s a , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . munro, j.b., wasserman, m.r., altman, r.b., wang, l. & blanchard, s.c. correlated conformational events in ef-g and the ribosome regulate translocation. nat struct mol biol , - ( ). . blanchard, s.c., kim, h.d., gonzalez, r.l., jr., puglisi, j.d. & chu, s. trna dynamics on the ribosome during translation. proc natl acad sci u s a , - ( ). . studer, s.m., feinberg, j.s. & joseph, s. rapid kinetic analysis of ef-g-dependent mrna translocation in the ribosome. j mol biol , - ( ). . wintermeyer, w. & rodnina, m.v. translational elongation factor g: a gtp-driven motor of the ribosome. essays biochem , - ( ). . ermolenko, d.n. et al. observation of intersubunit movement of the ribosome in solution using fret. j mol biol , - ( ). . ermolenko, d.n. & noller, h.f. mrna translocation occurs during the second step of ribosomal intersubunit rotation. nat struct mol biol , - ( ). . cornish, p.v. et al. following movement of the l stalk between three functional states in single ribosomes. proc natl acad sci u s a , - ( ). . nguyen, h.a., hoffer, e.d. & dunham, c.m. importance of a trna anticodon loop modification and a conserved, noncanonical anticodon stem pairing in trnacggprofor decoding. j biol chem , - ( ). . guo, z. & noller, h.f. rotation of the head of the s ribosomal subunit during mrna translocation. proc natl acad sci u s a , - ( ). . zhou, j., lancaster, l., donohue, j.p. & noller, h.f. spontaneous ribosomal translocation of mrna and trnas into a chimeric hybrid state. proc natl acad sci u s a , - ( ). . korniy, n., samatova, e., anokhina, m.m., peske, f. & rodnina, m.v. mechanisms and biomedical implications of - programmed ribosome frameshifting on viral and bacterial mrnas. febs lett , - ( ). . lajoie, m.j. et al. genomically recoded organisms expand biological functions. science , - ( ). . wang, k., de la torre, d., robertson, w.e. & chin, j.w. programmed chromosome fission and fusion enable precise large-scale genome rearrangement and assembly. science , - ( ). . mohan, s., donohue, j.p. & noller, h.f. molecular mechanics of s subunit head rotation. proc natl acad sci u s a , - ( ). . kaledhonkar, s. et al. late steps in bacterial translation initiation visualized using time- resolved cryo-em. nature , - ( ). . chen, b. et al. structural dynamics of ribosome subunit association studied by mixing- spraying time-resolved cryogenic electron microscopy. structure , - ( ). . reinkemeier, c.d., girona, g.e. & lemke, e.a. designer membraneless organelles enable codon reassignment of selected mrnas in eukaryotes. science ( ). . datsenko, k.a. & wanner, b.l. one-step inactivation of chromosomal genes in escherichia coli k- using pcr products. proc natl acad sci u s a , - ( ). . fei, j. et al. a highly purified, fluorescently labeled in vitro translation system for single- molecule studies of protein synthesis. methods enzymol , - ( ). . christian, t., lahoud, g., liu, c. & hou, y.m. control of catalytic cycle by a pair of analogous trna modification enzymes. j mol biol , - ( ). . zhang, c.m., perona, j.j., ryu, k., francklyn, c. & hou, y.m. distinct kinetic mechanisms of the two classes of aminoacyl-trna synthetases. j mol biol , - ( ). . peacock, j.r. et al. amino acid-dependent stability of the acyl linkage in aminoacyl- trna. rna , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . aitken, c.e., marshall, r.a. & puglisi, j.d. an oxygen scavenging system for improvement of dye stability in single-molecule fluorescence experiments. biophys j , - ( ). . gonzalez, r.l., jr., chu, s. & puglisi, j.d. thiostrepton inhibition of trna delivery to the ribosome. rna , - ( ). . desai, b.j. & gonzalez, r.l., jr. multiplexed, bioorthogonal labeling of multicomponent, biomolecular complexes using genomically encoded, non-canonical amino acids. biorxiv doi: . / ( ). . macdougall, d.d. & gonzalez, r.l., jr. translation initiation factor regulates switching between different modes of ribosomal subunit joining. j mol biol , - ( ). . bronson, j.e., fei, j., hofman, j.m., gonzalez, r.l., jr. & wiggins, c.h. learning rates and states from biophysical time series: a bayesian approach to model selection and single-molecule fret data. biophys j , - ( ). . viterbi, a.j. error bounds for convolutional codes and an asymptotically optimum decoding algorithm. ieee trans. inform. theory , - ( ). . agirrezabala, x. et al. visualization of the hybrid state of trna binding promoted by spontaneous ratcheting of the ribosome. mol cell , - ( ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / acknowledgements we thank dr. hajime tokuda for rabbit polyclonal anti-lolb antibodies, dr. colin kinz- thompson and korak kumar ray for help with smfret data analysis. r.l.g. and h.l. thank the columbia university precision biomolecular characterization facility for access to and support of instrumentation. this work was supported by nih grants gm to y-m.h. and gm to r.l.g., a charles h. revson foundation postdoctoral fellowship in biomedical science - to h.l., a japanese jsps overseas postdoctoral fellowship to i.m., and nsf grant che- to e.j.p. author contributions h.g. conceived of and performed ensemble rapid kinetic assays, r.l.g. and h.l. conceived of and designed smfret assays, h.l. performed smfret assays, i.m. performed cell-based reporter assays, d.m.r. and e.j.p. generated aminoacyl-dbe derivatives, t.c. performed g methylation and aminoacylation assays, and a.b.c. and g.b. provided e. coli s ribosomes. y.m.h. and r.l.g. wrote the manuscript. competing finanical interests the authors declare no competing interests. contact for reagent and resource sharing further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contacts ruben l. gonzalez, jr. (rlg @columbia.edu) and ya-ming hou (ya-ming.hou@jefferson.edu). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure legends figure . methylation and aminoacylation of sufb and prol. a sequence and secondary structure of native-state sufb , showing the n -methylated g in red and the g a insertion to prol in blue. b rnase t cleavage inhibition assays of trmd-methylated g -state sufb transcript confirm the presence of m g and m g a. cleavage products are marked by the nucleotide positions of gs. l: the molecular ladder of trna fragments generated from alkali hydrolysis. c primer extension inhibition assays identify m g in native-state sufb . red and blue arrows indicate positions of primer extension inhibition products at the methylated g and g a, respectively, which are offset by one nucleotide relative to prol. the first primer extension inhibition product for sufb corresponds to m g a, the second corresponds to m g , while the primer extension inhibition product for prol corresponds to m g . due to the propensity of primer extension to make multiple stops on a long transcript of trna, the read-through primer extension product ( - nucleotides) had a reduced intensity relative to the primer extension inhibition products ( - nucleotides). molecular size markers are provided by the primer alone ( nucleotides) and the run-off products ( - nucleotides). d trmd-catalyzed n methylation of g -state sufb and prol as a function of time. e, f prors-catalyzed aminoacylation. e aminoacylation of native-state sufb and prol. f aminoacylation of g -state sufb and prol as a function of time. in panels b, c, gels were performed three times with similar results, while in panels d-f, the bars are sd of three independent (n = ) experiments, and the data are presented as mean values ± sd. figure . sufb -induced + frameshifting and genome recoding. a the + -frameshifting efficiency in cell-based lacz assay for sufb and prol strains in m g + and m g – conditions. the bars in the graph are sd of four, five, or six independent (n = , , or ) biological repeats, and the data are mean values ± sd. b the difference in the ratio of protein synthesis of lolb to cyss for sufb and prol strains in m g + and m g – conditions relative to prol in the m g + .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / condition. c measurements underlying the bar plots in panel b. each ratio was measured directly and the ratio of prol in the m g + condition was normalized to . . the difference of each ratio relative to the normalized ratio represented the + -frameshifting efficiency at the ccc-c motif at the nd codon of lolb. the bars in the graph are sd of three independent (n = ) biological repeats, and the data are mean values ± sd. in a, b, decoding of the ccc-c motif was mediated by sufb and prom in the sufb strain, and by prol and prom in the prol strain, where the presence of prom ensured no vacancy at the ccc-c motif. the increased + frameshifting in the m g – condition vs. the m g + condition indicates that sufb and prol are each an active determinant in decoding the ccc-c motif. d sufb -mediated insertion of non-proteinogenic amino acids at the ccc-c motif in the th codon position of fola using [ s]-met-dependent in vitro translation. reporters of fola are denoted by +/– ccc-c, where “+” and “–” indicate constructs with and without the ccc-c motif. sds-page analysis identifies full-length dhfr resulting from a + - frameshift event at the ccc-c motif by sufb pre-aminoacylated with the amino acid shown at the top of each lane, a dc fragment resulting from lack of the + -frameshift event, and a dn fragment resulting from translation initiation at the aug codon likely at position or downstream from the ccc-c motif. gel samples were derived from the same experiment, which was performed five times with similar results. gels for each experiment were processed in parallel. lane : full-length dhfr as the molecular marker; deacyl: deacylated trna. figure . sufb uses a triplet anticodon-codon pairing scheme at the a site. a gtp hydrolysis by ef-tu as a function of time for delivery of g - or native-state sufb - or prol-tc to the a site of a s ic. although the concentration of tcs was limiting, which would limit the rate of binding of tcs to the s ic, the observed differences in the yield of gtpase activity indicated that binding was not the sole determinant, but that other factors, such as the identity and the methylation state of the trna, affected the gtpase activity. b dipeptide fmp formation as a function of time for delivery of g - or native-state sufb - or prol-tc to the a site of a s .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ic. due to the limiting concentration of the s ic, which did not include the trna substrate, the yield of di- or tri-peptide formation assays was constant even with different trnas in tcs. c the yield of fmp and fmr in dipeptide formation assays in which equimolar mixtures of native-state sufb -tc, carrying pro and/or arg, and/or native-state prol-tc, carrying pro and/or arg, are delivered to s ics. the mrna in s ics in (a-c) is aug-ccc-cgu-u. d dipeptide formation rate kfmp,obs for delivery of g -state sufb -tc to s ics containing sequence variants of the ccc-c motif in the a site. in panels a, b, the bars in the graphs are sd of three independent (n = ) experiments, in panel c, the bars in the graphs are sd of four independent (n = ) experiments, and in panel d, the bars in the graphs are sd of three or four independent (n = or ) experiments. all data are presented as mean values ± sd. ∆t: a time interval, nd: not detected. figure . plasticity of sufb -induced + frameshifting. a fmp formation as a function of time upon delivery of the g c variant of g -state sufb -tc to the a site of a s ic, allowing nucleotides - to pair with a ccc-c motif at the a site. b fmp formation as a function of time upon delivery of the g c variant of g -state sufb -tc to the a site of a s ic, allowing nucleotides - to pair with a ccc-c motif. c-f results of fmpv formation assays in which sufb -tc is delivered to an a site programmed with a quadruplet codon at the nd position and sequences of the sufb anticodon loop and/or quadruplet codon are varied. yields of fmpv formation represent + frameshifting during translocation of sufb from the a site to the p site. possible + -frame anticodon-codon pairing schemes of sufb during translocation: c g -state sufb capable of frameshifting at a ccc-c motif via quadruplet pairing and/or triplet slippage, d g c variant of g -state sufb capable of frameshifting at a gcc-c motif via quadruplet pairing and/or triplet slippage, e m g -state sufb capable of frameshifting at a ccc-c motif via only triplet slippage, and f g c variant of g -state sufb capable of frameshifting at a ccc-c motif via only triplet slippage. in panels a, b, the bars in the graphs are sd of three (n = ) independent experiments, and the data are presented as mean values ± sd. ∆t: a time interval. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . sufb shifts to the + -frame during translocation. a relative fmpv and fmpr formation as a function of time upon rapid delivery of ef-g and an equimolar mixture of g -state sufb -, trnaval-, and trnaarg-tcs to s ics carrying a ccc-c motif in the a site. b relative fmpv and fmpr formation as a function of time when a defined time interval is introduced between delivery of g -state sufb -tc and ef-g and delivery of an equimolar mixture of trnaarg- and trnaval-tcs. c relative fmpv and fmpr formation after reacting fmp-post complexes with a mixture of trnaval- and trnaarg-tcs based on the time courses in supplementary figures d-f. d fmpv formation as a function of time upon rapid delivery of trnaval-tc to an fmp-post complex carrying a ccc-n motif in the a site. e relative fmpv and fmps formation as a function of time upon rapid delivery of an equimolar mixture of trnaval- and trnaser-tcs to an fmp-post complex carrying a ccc-a motif in the a site. in panels a-e, the bars are sd of three (n = ) independent experiments and the data are presented as mean values ± sd. arg: arginyl-trnaarg; val: valyl-trnaval. figure . sufb interferes with elongation complex dynamics during late steps of translocation. a-c cartoon representation of elongation as a g -state sufb - or prol-tc is delivered to the a site of a bl (cy )- and ul (cy )-labeled s ic; a in the absence, or b in the presence of ef-g, or c upon using puromycin (pmn) to deacylate the p site-bound g -state sufb or prol and generate the corresponding pre–a complex. the s and s subunits are tan and light blue, respectively; the l stalk is dark blue; cy and cy are bright green and red spheres, respectively; ef-tu is pink; ef-g is purple; fmet-trnafmet is dark green; and sufb or prol is dark red. d, e hypothetical (top) and representative experimentally observed (bottom) efret vs. time trajectories recorded as prol-tc is delivered to a s ic, d in the absence and e in the presence of ef-g as depicted in a, b. the waiting times associated with k s ic→gs , kgs →gs , kgs →gs , and kgs →post are indicated in each hypothetical trajectory. f, g, and h surface contour .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / plots of the time evolution of population fret obtained by superimposing individual efret vs. time trajectories in the experiments in a, b, and c, respectively, for sufb (top) and prol (bottom). n: the number of trajectories used to construct each contour plot. surface contours are colored as denoted in the population color bars. for pre-steady-state experiments, the black dashed lines indicate the time at which the tc was delivered and the gray shaded areas denote the time required for the majority ( - %) of the s ics to transition to gs . note that the rate of deacylated sufb dissociation from the a site under our conditions is similar to that of ef-g- catalyzed translocation, thereby resulting in the buildup of a pre complex sub-population over - min post-delivery that lacks an a site trna and is incapable of translocation. this sub- population exhibits kgs →gs , kgs →gs , and keq values similar to those observed in experiments recorded in the absence of ef-g (supplementary table ). figure . structure-based mechanistic model for sufb -induced + frameshifting. a sufb - tc uses triplet anticodon-codon pairing in the -frame at a ccc-c motif, undergoes peptide-bond formation, and enables the resulting pre complex to undergo a gs →gs transition, all with rates similar to those of prol-tc. during the gs →gs transition, the s subunit rotates relative to the s subunit by º in the counter-clockwise (+) direction along the black curved arrow; the s subunit head swivels relative to the s subunit body by º in the clockwise (–) direction against the black curved arrow; the l stalk closes by ~ Å; and the trnas are reconfigured from their p/p and a/a to their p/e and a/p configurations. ef-g then binds to the pre complex to form pre-g and subsequently catalyzes a series of conformational rearrangements of the complex (pre-g to pre-g ) that encompass further counter-clockwise and clockwise rotations of the subunits; severing of decoding center interactions with the anticodon-codon duplex in the a site; counter-clockwise and clockwise swiveling of the head and the associated opening and closing of the e-site gate; opening of the l stalk; and .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / reconfigurations of the trnas as they move from the p and a sites to the e and p sites. it is during these steps, shown in red arrows within the gray shaded box, that sufb impedes forward and/or reverse swiveling of the head and the associated opening and/or closing of the e-site gate, facilitating + frameshifting. next, ef-g and the deacylated trna dissociate from pre-g , leaving a post complex ready to enter the next elongation cycle. the cartoons depicting pre- g (gs ) and pre-g (gs ) were generated using biological assemblies and , respectively, of pdb entry v d. due to the lack of an a-site trna or ef-g in v d, cartoons of the a- and p-site trnas from previous structures were positioned into the two assemblies using the p-site trnas in v d as guides and a cartoon of ef-g generated from v d was manually positioned near the factor binding site of the ribosomes. the cartoons depicting pre-g , pre-g , and pre- g were generated from v d, w , and v f, respectively, and colored as in figure , with the head domain shown in orange. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure p a . . . e f r e t time (s) time (s) prol-tc prol-tc +ef-g deliver tcsat s deliver tcs+ef-gat s n= k s ic→gs kgs →gs a kgs →gs a p. . time e time . ( e ) . ( l se ) . . n= prol gs p e stal p st gs (ef-g) gs (ef-g) e gs gs e f r e t k s ic→gs kgs →p st time (s) . . e f r e t n = n= n= n= suf n= n= n= prol . . . . . . . . e f r e t suf . . . . . . . . n= prol s ic tc ef-g pm n= time (mi ) time (mi ) . gs . . . . time (s) time (s) . time (s) . a n= n= suf . . time (s) time (s) . . time (s) . . time (s) . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure Å post pre pre-g pre-g pre-g pre-g post gs gs gs gs Å intersubunit rotation ° ° ° ° ° head swive in - ° - ° ° ° ° .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / distinct cryo-em structure of α-synuclein filaments derived by tau distinct cryo-em structure of -synuclein filaments derived by tau alimohammad hojjatian , anvesh k. r. dasari , urmi sengupta , dianne taylor , nadia daneshparvar , fatemeh abbasi yeganeh , lucas dillard , brian michael , robert g. griffin , mario borgnia , rakez kayed , kenneth a. taylor , kwang hun lim ,* institute of molecular biophysics, florida state university, tallahassee, fl - , usa. department of chemistry, east carolina university, greenville, nc , usa. departments of neurology, neuroscience and cell biology, university of texas medical branch, galveston, tx, , usa. genome integrity and structural biology laboratory, national institute of environmental health sciences, national institutes of health, department of health and human services, research triangle park, nc, , usa. department of chemistry and francis bitter magnet laboratory, massachusetts institute of technology, cambridge, ma, , usa. corresponding authors: limk@ecu.edu .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint mailto:limk@ecu.edu https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract recent structural studies of ex vivo amyloid filaments extracted from human patients demonstrated that the ex vivo filaments associated with different disease phenotypes adopt diverse molecular conformations distinct from those in vitro amyloid filaments. a very recent cryo-em structural study also revealed that ex vivo -synuclein filaments extracted from multiple system atrophy (msa) patients adopt quite distinct molecular structures from those of in vitro -synuclein filaments, suggesting the presence of co-factors for -synuclein aggregation in vivo. here, we report structural characterizations of -synuclein filaments derived by a potential co-factor, tau, using cryo-em and solid-state nmr. our cryo-em structure of the tau-promoted -synuclein filament at . Å resolution is somewhat similar to one of the polymorphs of in vitro -synuclein filaments. however, the n- and c-terminal regions of the tau-promoted -synuclein filament have different molecular conformations. our structural studies highlight the conformational plasticity of -synuclein filaments, requiring additional structural investigation of not only more ex vivo - synuclein filaments, but also in vitro -synuclein filaments formed in the presence of diverse co- factors to better understand molecular basis of diverse molecular conformations of -synuclein filaments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction aggregation of α-synuclein into amyloid filaments is associated with numerous neurodegenerative diseases including parkinson’s disease (pd), dementia with lewy bodies (dlb), and multiple system atrophy (msa) collectively termed synucleiopathy. increasing evidence suggests that the protein aggregates play a key role in the initiation and spreading of pathology in the neurodegenerative diseases. - it was shown that α-synuclein aggregates are capable of spreading through the brain and acting as seeds to promote misfolding and aggregation like prion. - although precise molecular mechanisms underlying the neurodegenerative disorders have remained elusive, misfolded α-synuclein aggregates including oligomeric and sonicated fibrillar species exhibit cytotoxic activities. in addition, injection of preformed filamentous α-synuclein aggregates into mice induced pd-like pathology. , structural elucidation of filamentous α- synuclein aggregates is, therefore, essential to understanding molecular basis of neurotoxic properties of α-synuclein aggregates and developing therapeutic strategies. α-synuclein is a -residue protein expressed predominantly in the dopaminergic neurons. the intrinsically disordered protein adopts heterogeneous ensembles of conformations. the diverse conformers in the conformational ensemble might be induced to form distinct amyloid aggregates with different molecular conformations depending on experimental conditions (figure ). , , indeed, recent high-resolution structural studies using solid-state nmr and cryo-em revealed that α-synuclein filaments can adopt diverse molecular conformations under various in vitro experimental conditions. - structural analyses of α-synuclein aggregates seeded by brain extracts from pd and msa patients suggested that the brain-derived aggregates are heterogenous mixtures of filaments that are distinct from in vitro α-synuclein filaments. very recently, high- resolution cryo-em structures of α-synuclein filaments extracted from msa and dlb patients were reported. interestingly, two types of α-synuclein filaments consisting of two twisting asymmetric protofilaments were observed in msa filaments extracted from patients. on the other hand, ex vivo dlb filaments were untwisted and morphologically different from those of ex vivo msa. the structural studies revealed that ex vivo α-synuclein filaments are structurally diverse and quite distinct from those of in vitro α-synuclein filaments produced in buffer, suggesting that diverse co-factors may exist in vivo and induce formation of different α-synuclein filaments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . a schematic diagram of energy landscape for α-synuclein aggregation. α-synuclein remains largely unfolded at low protein concentrations (< . mm) under physiological conditions. the formation of filamentous aggregates is triggered at aggregation- prone conditions such as higher protein concentrations and more acidic ph. , misfolding and aggregation of α-synuclein is also promoted by interactions with a variety of co-factors such as lipids, poly(adp-ribose) (par), and other pathological aggregation-prone proteins such as tau and aβ( – ) peptides. , - the co-factors may interact with monomeric α-synuclein and lead to distinct misfolding pathways, resulting in different molecular conformations. comparative structural analyses of in vitro α-synuclein filaments derived by co-factors and brain-derived ex vivo α-synuclein filaments are required to identify co-factors that promote α-synuclein aggregation in vivo. our previous nmr study revealed that tau interacts with the c-terminal region of α- synuclein, accelerating the formation of α-synuclein filaments. here we report structural investigation of tau-promoted α-synuclein filaments using solid-state nmr and cryo-em to investigate the effect of the interactions on the structure of α-synuclein filaments. our initial solid- .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / state nmr studies indicate that the tau-promoted α-synuclein filaments have similar structural features to those of one of the polymorphs. however, cryo-em structure of the tau-promoted α- synuclein filaments at . Å resolution revealed distinct molecular conformations in the n- and c- terminal regions with a much faster helical twist, suggesting that the co-factor, tau, directs α- synuclein into a distinct misfolding and aggregation pathway. experimental methods protein expression and purification α-synuclein: full-length α-synuclein was expressed in bl (de ) e. coli cells using pet a plasmid (a gift from michael j fox foundation, addgene plasmid # ) and was purified at ℃ as previously described. briefly, the transformed e. coli cells were grown at °c in lb medium to an od of . . the protein expression was induced by addition of iptg to a final concentration of mm and the cells were harvested by centrifugation after hrs of incubation at oc. the bacterial pellet was resuspended in lysis buffer ( mm tris, mm nacl, ph . ) and sonicated at oc. the soluble fraction of the lysate was precipitated with ammonium sulfate ( %). the resulting protein pellet collected by centrifugation at g was resuspended in mm tris buffer (ph . ) and the protein solution was dialyzed against mm tris buffer overnight at ℃. α-synuclein was purified by anion exchange chromatography (hitrap q hp; mm tris buffer, ph ) and size exclusion chromatography (hiload / superdex pg; mm phosphate buffer, ph . ) at ℃. tau: recombinant full-length tau ( n r) protein was expressed and purified from bl (de ) e. coli cells transformed with the pet b plasmid (a gift from dr. smet-nocca, université de lille, sciences et technologies, france) as previously described. briefly, when the cells were grown at oc in lb medium to an od of . , they were induced by addition of . mm iptg and incubated for - hrs at °c. after the induction, the cells were harvested by centrifugation. the bacterial pellet was resuspended in the lysis buffer and sonicated at oc. the soluble fraction was heated at ℃ for min and the precipitates were removed by centrifugation. the supernatant containing tau protein was purified by cation exchange chromatography (hitrap sp hp; mm mes, mm dtt, mm mgcl , mm egta, mm pmsf) followed by size exclusion chromatography (hiload / superdex pg; mm phosphate buffer, mm nacl, mm dtt, ph . ). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / preparation of tau promoted α-synuclein filaments to prepare α-synuclein filaments in the presence of tau, monomeric α-synuclein ( µm in mm phosphate buffer, ph . ) was mixed with tau monomers ( µm in mm phosphate buffer, ph . ) and incubated at ℃ for day under constant agitation at rpm in an orbital shaker. filamentous aggregates were examined with transmission electron microscopy (tem). tem α-synuclein filamentous solution ( mg/ml) was diluted by times with mm phosphate buffer (ph . ) and l of the diluted solution was placed on a formvar/carbon supported mesh copper grid. after sec incubation of the sample on the tem grid, excess sample was blotted off with a filter paper. the grids were washed briefly with l of % uranyl acetate. the samples were then stained with l of % uranyl acetate for sec and the excess stain was blotted off with a filter paper. the grids were then allowed to air dry and tem images were collected using a philips cm transmission electron microscope at an accelerating voltage of kv. cryo-em data collection a four microliter α-synuclein filamentous solution was applied to the back of each of the glow- discharged r / quantifoil grids. for the formation of vitrified ice, the grids were manually plunge-frozen into liquid nitrogen temperature cooled liquid ethane, after seconds of blotting with filter papers. grids were examined on titan krios electron microscope, equipped with gatan k camera operated at kv. the defocus on camera was set to be randomly within , - , Å range. the images have been collected with gatan automated data collection software latitude s (gatan, inc). the magnification was set to , and as a result the nominal pixel size is set to . Å (the calibrated pixel size is found to be . Å). image processing movies were beam-induced motion corrected (in frame and among frames) and dose-weighted using motioncor . aligned (non-dose-weighted) integrated micrographs were used for contrast transfer function (ctf) estimation of each micrograph, using gctf . using relion -beta , .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / filaments were manually picked and extracted with helical extraction. two-dimensional ( d) classification in cistem was performed to determine the segments with better quality and no crossing filaments. segments from the best-looking classes were selected and moved to relion for further processing. helical d classification in relion moved all of the segments into a limited number of classes, independent of the values used for regularization parameter (t). using a cylinder (produced by relion_helix_toolbox) as the initial model and a very tight mask, the segments were helically d refined with local search for symmetry, starting from . Å and - ° values for helical rise and helical twist, respectively . the result of the refinement (resolution: ~ Å) was then lowpass filtered to Å and then used for d classification without alignment with a very tight mask (t= ), into classes which resulted in two improved classes with major portion of the particles (~ , and ~ , particles). each of these two classes has been processed, but only the class with ~ , particles produced a higher resolution structure. following the same methodology used in similar studies , we continued with d classification into class, starting with the class average from the last d classification lowpass filtered to Å (t= ) with the same tight mask to focus the refinement on the separation of the subunits of the α-synuclein within the mask. step by step increase of the value of t up to , resulted in a higher resolution for the reconstruction. then to down-weigh the role of mask, we extended the binary mask much more beyond the diameter of the filament to include the structure inside the mask. local search for helical twist and helical rise converged to - . ° and . Å. the structure hinted a higher-level helical symmetry with . ° and . Å for helical twist and helical rise, respectively, and thus those values were used for further refinement. the handedness of the filaments was initially imposed arbitrarily. later using a tomography data set of the same filament, the filaments were verified as left-handed. auto-refinement, with t= , resulted in the best map with the highest resolution. two separate rounds of beam-tilt correction using ctfrefine in relion were done to improve the overall resolution as well as the map visual quality. per particle ctf refinement, however, did not improve the resolution of the map. using relion post-processing, we were able to determine the overall reconstruction resolution to be Å (figure s ). local resolution was determined using the corresponding relion tool and local sharpening was done using localdeblur . over-sharpening was seen in last iterations of local sharpening. consequently, the result with the lowest amount of noise was selected for model building. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / atomic model building and refinement resolution of the tau-promoted α-synuclein filament density map was not high enough for de novo modeling of the structure. however, our solid-state nmr data showed that the resonances for certain residues in our filaments are similar to those of a previously reported structure (pdb: rt ) for α-synuclein filaments. hence, the atomic model was built, using rt as the initial model, starting from the region having residues with resonances of high similarity (residues - ) in coot . then a poly-alanine model was built into the density and the residues were later replaced by the correct sequence. the atomic model was refined using phenix real space refinement , manually modified in coot and validated using phenix (table s and s ). lack of well-resolved sidechains makes it difficult to investigate salt-bridges between protofilaments. therefore, we used mdff with explicit solvent to look at the molecular dynamic interactions in the atomic model. water molecules were added in vmd and the solvent was neutralized with mm of nacl to simulate physiological ionic strength conditions. salt-bridges between protofilaments (k , e ) were detected using the salt-bridge module of vmd. data availability: the electron density map is available in electron microscopy data bank (emdb) with id emd- and the atomic model is available in protein data bank (pdb) with id l h. the raw data, intermediate maps, masks, and intermediate atomic models are all available from the authors upon request. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / results solid-state nmr of tau-promoted α-synuclein filaments monomeric α-synuclein ( m) was incubated in the presence of tau monomers ( m) at ℃ in mm phosphate buffer (ph . ). long homogeneous filamentous aggregates were observed after hrs of incubation in the presence of tau (figure ). solid-state nmr was initially used to compare structural features of the tau-promoted filaments to those of previously reported in vitro α-synuclein filaments (figure ). the two-dimensional c- c correlation spectrum obtained with dipolar-assisted rotational resonance mixing scheme (darr) suggests that the tau-promoted filaments (figure a) have distinct molecular conformations from those of two α-synuclein filaments (figure c and d). on the contrary, the d darr spectrum of the tau-promoted filaments is somewhat similar to that of the in vitro filament (red in figure b) with notable differences (black in figure b). these solid-state nmr results suggest that the co-factor, tau, appears to induce the formation of a specific fibrillar conformation. figure . representative tem images of tau-promoted α-synuclein filaments showing the homogeneous twisting filaments. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . overview of aliphatic region of d c - c darr nmr spectra of uniformly c/ n labeled α-synuclein filament polymorphs. (a) tau-promoted α-synuclein polymorph. (b) fibril- type α-synuclein polymorph (bmrb ) . (c) ribbon-type α-synuclein polymorph (bmrb ). (d) greek-key type α-synuclein polymorph (bmrb ) . cross-peaks with similar nmr resonances for the tau-promoted α-synuclein polymorph and ribbon-type polymorph are colored red in b. the ribbon- and fibril-type polymorphs of α-synuclein have distinct molecular packing arrangement and intermolecular interactions. , the nmr cross-peaks were drawn using our experimental darr spectrum for the tau-promoted filaments (a) and chemical shifts reported in bmrb for the previously reported darr spectra of α-synuclein filaments (b – d). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cryo-em structure of the tau-promoted α-synuclein filaments cryo-em was then used to determine near-atomic structure of the tau-promoted α- synuclein filaments. preformed tau-promoted filaments were frozen on a carbon-coated grid and images were acquired at , x magnification on a titan krios ( kv) equipped with a k gatan direct electron detector camera. about , segments extracted from , micrographs were analyzed using relion reference-free two-dimensional ( d) classification. the initial classification analyses revealed one major species in the d classes (figure a). the d classes show that the protofilaments are twisted around with a crossover distance of Å (figure b) and a helical rise of . Å based on the power spectrum (figure c). the left-twisting handedness was determined by cryo-electron tomography. the d classes were used for three dimensional ( d) helical reconstruction in relion , which resulted in a d density map at . Å resolution (figure a). figure . d class averages of the tau-promoted α-synuclein filaments. (a) representative d class averages of tau derived filaments using cistem (box size of Å). (b) a sinogram representing the full rotation along the helical axis of the filament produced by relion_helix_inimodel d as described by scheres. (c) the power spectrum of selected d reference-free class averages. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . structural comparison of α-synuclein filament polymorphs. (a) overlay of the tau- promoted α-synuclein filament atomic model on the density map. (b and c) α-synuclein filament polymorphs a and b, respectively, determined by previous cryo-em structural studies. (d) overlaid structures of the tau-promoted α-synuclein filament (purple) and polymorph b (green). the same salt bridge between the residues k and e was observed in the interfacial region of the polymorph b (figure c) and tau-promoted α-synuclein polymorph (figure d). .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the d density map of the tau-promoted α-synuclein filaments was similar to that of the polymorph b, as was suggested by our solid-state nmr (figure a and b). it is, however, interesting to note that tau induced the formation of only one polymorph (figure c), although the previous study showed that the protofilaments were assembled into the two fibril polymorphs in the same buffer (figure b and c). the tau-promoted α-synuclein filament also exhibits notable differences in comparison to that of the polymorph, particularly the n- and c-terminal regions (figure d and figure s ). firstly, the interaction between the n-terminal ( - ) and c-terminal ( - ) regions are not observed in the tau-promoted filament. secondly, the more extensive c- terminal region ( - ) is disordered in comparisons with that of the other structure ( - ), which might be due to interactions between the positively charged tau and negatively charged c- terminal region of α-synuclein (figure s ). thirdly, the tau-promoted filaments with a half-pitch of nm are twisted much faster in comparison with that of polymorph b ( nm) (figure d and figure s ). the structural model for the tau-promoted filaments was compared to the previously reported structures of α-synuclein filaments (figure ). our tau-promoted α-synuclein filaments adopt an overall greek-key type structure observed in the first solid-state nmr structure of α- synuclein filaments. however, several regions including the n- and c-terminal regions (residues - and - ) are notably different from the previous greek-key type structures, as was suggested by our solid-state nmr results (figure a and d). in addition, interfacial contacts between the two protofilaments and the degree of helical twist (table s ) are quite distinct from those of the previously reported structures. these results indicate that interactions between co- factors and α-synuclein may lead to distinct molecular conformations and intermolecular contacts between the protofilaments of α-synuclein. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . structural comparison of various polymorphs of full-length α-synuclein filaments. representative structures of (a) α-synuclein polymorphs a (pdb n a, a b) , and polymorph b (pdb cu ) . (b) α-synuclein polymorph a (pdb ssx) , polymorph b (pdb sst) and tau-promoted α-synuclein polymorph (this study, pdb l h). (c) msa patient derived α-synuclein polymorph type- (pdb xyp) and type- (pdb xyo, xyq) . (d-f) overlay of protofilament folds of tau-promoted α-synuclein filament with polymorphs , and ex vivo msa polymorphs, respectively. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / discussion molecular mechanism by which α-synuclein self-assembles into fibrillar aggregates in vivo has remained largely unknown. it was previously shown that monomeric α-synuclein is stabilized by long-range interactions between the n- and c-terminal regions. , perturbations of the long-range interactions may initiate misfolding and aggregation of α-synuclein. indeed, various co-factors that interact with the n- and/or c-terminal regions promoted the formation of fibrillar aggregates of α- synuclein. - recently solved cryo-em structures of ex vivo α-synuclein filaments extracted from msa and dlb patients revealed that the ex vivo filaments adopt distinct molecular structures from those of in vitro α-synuclein filaments, supporting that co-factors may play important roles in promoting α-synuclein aggregation in vivo. comparative structural analyses of α-synuclein aggregates derived by co-factors and ex vivo aggregates will, therefore, be required to identify co- factors that may play critical roles in α-synuclein aggregation in vivo. several lines of evidence indicate that pathological proteins such as -amyloid (a) peptides, tau and α-synuclein synergistically promote their mutual aggregation. , - in particular, co-existence of tau and α-synuclein aggregates in synucleinopathy patient’s brains suggests that tau may interact with α-synuclein, accelerating the formation of fibrillar α-synuclein aggregates in vivo. in this work, we solved cryo-em structure of α-synuclein filaments derived by tau and compared the structure to those of previously reported structures of α-synuclein filaments. previous structural studies of α-synuclein filaments revealed that α-synuclein can form diverse filamentous aggregates with distinct molecular conformations (figure ). polymorphic structures were also observed for the filaments formed even in the same buffer. , , it is plausible that multiple conformers in the conformational ensemble of disordered α-synuclein are able to form diverse α-synuclein filaments with different molecular conformations and/or different interfaces between the protofilaments (figure ). interestingly, tau-promoted α-synuclein filaments adopt a greek-key type structure similar to one of the polymorphic α-synuclein filaments. however, the detailed molecular conformation and the degree of the helical twist are different from those of the polymorphs (table s ). in addition, recent studies revealed that poly(adp-ribose) may interact with α-synuclein in vivo and induce the formation of a more toxic α-synuclein strain with distinct molecular conformations. these results suggest that the interaction between co-factors and α- synuclein may direct the protein to a specific misfolding and aggregation pathway toward the .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / distinct α-synuclein filament, highlighting the importance of cellular environments in protein misfolding and aggregation. previous structural studies of the full-length and truncated α-synuclein filaments revealed that the c-terminally truncated α-synuclein filaments have increased helical twists even though the full-length and truncated filaments adopt almost identical core structures, , suggesting that the negatively charged c-terminal region affects the helical twist in the parallel alignment. thus, the increased helical twist of the tau-promoted full-length α-synuclein filaments may result from the electrostatic interaction between the positively charged tau and negatively charged c-terminal region of α-synuclein, which may reduce repulsive interactions between the c-terminal regions in the parallel alignment and facilitate the tighter helical twist. the longer disordered c-terminal region (residues – ) in the tau-promoted filament compared to that of the previously reported α-synuclein filaments (residues – ) may also result from the interaction between the tau and c-terminal regions of α-synuclein. in summary, we report a distinct molecular structure of the α-synuclein filament formed in the presence of tau. the interaction between the c-terminal region of α-synuclein and tau leads to a distinct molecular conformation of α-synuclein filament with a shorter helical pitch. these results suggest that interaction between α-synuclein and various potential co-factors in cellular environments may promote the formation of diverse α-synuclein filaments with different molecular conformations. more extensive comparative structural analyses of in vitro α-synuclein filaments derived by co-factors and ex vivo α-synuclein filaments extracted from the patients are required to better understand molecular mechanism of α-synuclein aggregation in vivo. associated content supporting information. fsc curve. structural comparison of tau derived α-synuclein polymorph and polymorph b. density maps showing the helical twisting patterns α-synuclein polymorph and polymorph b. cryo-em data collection, refinement, and validation statistics. helical twists comparison of various α-synuclein polymorphs. the following files are available free of charge. .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / author information corresponding author *limk@ecu.edu. author contributions the manuscript was written through contributions of all authors. all authors have given approval to the final version of the manuscript. funding sources this work was supported in part by nih r ns (k.h.l.), r ag (r.k.) and r ns (r.k.). notes the authors declare no competing financial interest. acknowledgment we thank dr. jun-yong choe (east carolina university) for helpful discussion. we also thank hamidreza rahmani for helpful suggestions on molecular dynamics analysis on the atomic model. abbreviations nmr, nuclear magnetic resonance; tem, transmission electron microscopy; darr, dipolar assisted rotational resonance; cryo-em, cryo-electron microscopy. accession codes α-synuclein: uniprotkb entry p tau: uniprotkb entry p .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references . goedert, m. ( ) alpha-synuclein and neurodegenerative diseases. nat. rev. neurosci. , - . . westermark, g. t., and westermark, p. ( ) prion-like aggregates: infectious agents in human disease. trends mol. med. , - . . clavaguera, f., lavenir, i., falcon, b., frank, s., goedert, m., and tolnay, m. ( ) "prion- like" templated misfolding in tauopathies. brain pathol. , - . . kim, j., and holtzman, d. m. ( ) medicine. prion-like behavior of amyloid-beta. science. , - . . jucker, m., and walker, l. c. ( ) self-propagation of pathogenic protein aggregates in neurodegenerative diseases. nature. , - . . frost, b., and diamond, m. i. ( ) prion-like mechanisms in neurodegenerative diseases. nat rev neurosci. , - . . luk, k. c., kehm, v., carroll, j., zhang, b., o'brien, p., trojanowski, j. q., and lee, v. m. ( ) pathological alpha-synuclein transmission initiates parkinson-like neurodegeneration in nontransgenic mice. science. , - . . irwin, d. j., lee, v. m., and trojanowski, j. q. ( ) parkinson's disease dementia: convergence of alpha-synuclein, tau and amyloid-beta pathologies. nat. rev. neurosci. , - . . iba, m., guo, j. l., mcbride, j. d., zhang, b., trojanowski, j. q., and lee, v. m. ( ) synthetic tau fibrils mediate transmission of neurofibrillary tangles in a transgenic mouse model of alzheimer's-like tauopathy. j. neurosci. , - . . peng, c., gathagan, r. j., covell, d. j., medellin, c., stieber, a., robinson, j. l., zhang, b., pitkin, r. m., olufemi, m. f., luk, k. c., trojanowski, j. q., and lee, v. m. ( ) cellular milieu imparts distinct pathological alpha-synuclein strains in alpha-synucleinopathies. nature. , - . . wong, y. c., and krainc, d. ( ) Α-synuclein toxicity in neurodegeneration: mechanism and therapeutic strategies. nat. med. , - . . sacino, a. n., brooks, m., thomas, m. a., mckinney, a. b., lee, s., regenhardt, r. w., mcgarvey, n. h., ayers, j. i., notterpek, l., borchelt, d. r., golde, t. e., and giasson, b. i. ( ) intramuscular injection of alpha-synuclein induces cns alpha-synuclein pathology and a rapid-onset motor phenotype in transgenic mice. proc. natl. acad. sci. u. s. a. , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . spillantini, m. g., schmidt, m. l., lee, v. m., trojanowski, j. q., jakes, r., and goedert, m. ( ) alpha-synuclein in lewy bodies. nature. , - . . stephens, a. d., zacharopoulou, m., and kaminski schierle, g. s. ( ) the cellular environment affects monomeric α-synuclein structure. trends biochem. sci. , - . . candelise, n., schmitz, m., thüne, k., cramm, m., rabano, a., zafar, s., stoops, e., vanderstichele, h., villar-pique, a., llorens, f., and zerr, i. ( ) effect of the micro- environment on α-synuclein conversion and implication in seeded conversion assays. transl. neurodegener. , - . ecollection . . tuttle, m. d., comellas, g., nieuwkoop, a. j., covell, d. j., berthold, d. a., kloepper, k. d., courtney, j. m., kim, j. k., barclay, a. m., kendall, a., wan, w., stubbs, g., schwieters, c. d., lee, v. m., george, j. m., and rienstra, c. m. ( ) solid-state nmr structure of a pathogenic fibril of full-length human alpha-synuclein. nat. struct. mol. biol. , - . . li, y., zhao, c., luo, f., liu, z., gui, x., luo, z., zhang, x., li, d., liu, c., and li, x. ( ) amyloid fibril structure of alpha-synuclein determined by cryo-electron microscopy. cell res. , - . . li, b., ge, p., murray, k. a., sheth, p., zhang, m., nair, g., sawaya, m. r., shin, w. s., boyer, d. r., ye, s., eisenberg, d. s., zhou, z. h., and jiang, l. ( ) cryo-em of full-length alpha-synuclein reveals fibril polymorphs with a common structural kernel. nat. commun. , - . . ni, x., mcglinchey, r. p., jiang, j., and lee, j. c. ( ) structural insights into alpha- synuclein fibril polymorphism: effects of parkinson's disease-related c-terminal truncations. j. mol. biol. , - . . guerrero-ferreira, r., taylor, n. m., arteni, a. a., kumari, p., mona, d., ringler, p., britschgi, m., lauer, m. e., makky, a., verasdonck, j., riek, r., melki, r., meier, b. h., bockmann, a., bousset, l., and stahlberg, h. ( ) two new polymorphic structures of human full-length alpha-synuclein fibrils solved by cryo-electron microscopy. elife. , . /elife. . . guerrero-ferreira, r., taylor, n. m., mona, d., ringler, p., lauer, m. e., riek, r., britschgi, m., and stahlberg, h. ( ) cryo-em structure of alpha-synuclein fibrils. elife. , . /elife. . . strohäker, t., jung, b. c., liou, s. h., fernandez, c. o., riedel, d., becker, s., halliday, g. m., bennati, m., kim, w. s., lee, s. j., and zweckstetter, m. ( ) structural heterogeneity of α-synuclein fibrils amplified from patient brain extracts. nat. commun. , -w. . schweighauser, m., shi, y., tarutani, a., kametani, f., murzin, a. g., ghetti, b., matsubara, t., tomita, t., ando, t., hasegawa, k., murayama, s., yoshida, m., hasegawa, m., .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / scheres, s. h. w., and goedert, m. ( ) structures of alpha-synuclein filaments from multiple system atrophy. nature. , - . . giasson, b. i., forman, m. s., higuchi, m., golbe, l. i., graves, c. l., kotzbauer, p. t., trojanowski, j. q., and lee, v. m. ( ) initiation and synergistic fibrillization of tau and alpha-synuclein. science. , - . . kam, t. i., mao, x., park, h., chou, s. c., karuppagounder, s. s., umanah, g. e., yun, s. p., brahmachari, s., panicker, n., chen, r., andrabi, s. a., qi, c., poirier, g. g., pletnikova, o., troncoso, j. c., bekris, l. m., leverenz, j. b., pantelyat, a., ko, h. s., rosenthal, l. s., dawson, t. m., and dawson, v. l. ( ) poly(adp-ribose) drives pathologic α-synuclein neurodegeneration in parkinson's disease. science. , eaat . doi: . /science.aat . . galvagnion, c., buell, a. k., meisl, g., michaels, t. c., vendruscolo, m., knowles, t. p., and dobson, c. m. ( ) lipid vesicles trigger α-synuclein aggregation by stimulating primary nucleation. nat. chem. biol. , - . . sengupta, u., puangmalai, n., bhatt, n., garcia, s., zhao, y., and kayed, r. ( ) polymorphic α-synuclein strains modified by dopamine and docosahexaenoic acid interact differentially with tau protein. mol. neurobiol. , - . . dasari, a. k. r., kayed, r., wi, s., and lim, k. h. ( ) tau interacts with the c-terminal region of α-synuclein, promoting formation of toxic aggregates with distinct molecular conformations. biochemistry. , - . . ghee, m., melki, r., michot, n., and mallet, j. ( ) pa , the regulatory complex of the s proteasome, interferes with alpha-synuclein assembly. febs j. , - . . despres, c., byrne, c., qi, h., cantrelle, f. x., huvent, i., chambraud, b., baulieu, e. e., jacquot, y., landrieu, i., lippens, g., and smet-nocca, c. ( ) identification of the tau phosphorylation pattern that drives its aggregation. proc. natl. acad. sci. u. s. a. , - . . zheng, s. q., palovcak, e., armache, j., cheng, y., and agard, d. a. ( ) anisotropic correction of beam-induced motion for improved single-particle electron cryo-microscopy. cold spring harbor laboratory, . . zhang, k. ( ) gctf: real-time ctf determination and correction. cold spring harbor laboratory, . . zivanov, j., nakane, t., forsberg, b. o., kimanius, d., hagen, w. j., lindahl, e., and scheres, s. h. ( ) new tools for automated high-resolution cryo-em structure determination in relion- . elife. , - . . grant, t., rohou, a., and grigorieff, n. ( ) cistem, user-friendly software for single- particle image processing. elife. , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . he, s., and scheres, s. h. w. ( ) helical reconstruction in relion. journal of structural biology. , - . . fitzpatrick, a. w. p., falcon, b., he, s., murzin, a. g., murshudov, g., garringer, h. j., crowther, r. a., ghetti, b., goedert, m., and scheres, s. h. w. ( ) cryo-em structures of tau filaments from alzheimer’s disease brain. nature. , - . . ramírez-aportela, e., vilas, j. l., glukhova, a., melero, r., conesa, p., martínez, m., maluenda, d., mota, j., jiménez, a., vargas, j., marabini, r., sexton, p. m., carazo, j. m., and sorzano, c. o. s. ( ) automatic local resolution-based sharpening of cryo-em maps. computer applications in the biosciences. , - . . emsley, p., lohkamp, b., scott, w. g., and cowtan, k. ( ) features and development of coot. acta crystallographica. section d, biological crystallography. , - . . dimaio, f., song, y., li, x., brunner, m. j., xu, c., conticello, v., egelman, e., marlovits, t. c., cheng, y., and baker, d. ( ) atomic-accuracy models from . -Å cryo-electron microscopy data with density-guided iterative local refinement. nature methods. , - . . afonine, p. v., klaholz, b. p., moriarty, n. w., poon, b. k., sobolev, o. v., terwilliger, t. c., adams, p. d., and urzhumtsev, a. ( ) new tools for the analysis and validation of cryo- em maps and atomic models. acta crystallographica. section d, structural biology. , - . . trabuco, l. g., villa, e., schreiner, e., harrison, c. b., and schulten, k. ( ) molecular dynamics flexible fitting: a practical guide to combine cryo-electron microscopy and x-ray crystallography. methods (san diego, calif.). , - . . humphrey, w., dalke, a., and schulten, k. ( ) vmd: visual molecular dynamics. journal of molecular graphics. , - . . takegoshi, k., nakamura, s., and terao, t. ( ) c– h dipolar-driven c– c recoupling without c rf irradiation in nuclear magnetic resonance of rotating solids. the journal of chemical physics. , - . . gath, j., bousset, l., habenstein, b., melki, r., bockmann, a., and meier, b. h. ( ) unlike twins: an nmr comparison of two alpha-synuclein polymorphs featuring different toxicity. plos one. , e . . gath, j., habenstein, b., bousset, l., melki, r., meier, b. h., and bockmann, a. ( ) solid-state nmr sequential assignments of alpha-synuclein. biomol. nmr assign. , - . . bousset, l., pieri, l., ruiz-arlandis, g., gath, j., jensen, p. h., habenstein, b., madiona, k., olieric, v., bockmann, a., meier, b. h., and melki, r. ( ) structural and functional characterization of two alpha-synuclein strains. nat. commun. , . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . scheres, s. h. w. ( ) amyloid structure determination in relion- . . acta crystallogr. d. struct. biol. , - . . bernado, p., bertoncini, c. w., griesinger, c., zweckstetter, m., and blackledge, m. ( ) defining long-range order and local disorder in native alpha-synuclein using residual dipolar couplings. j. am. chem. soc. , - . . dedmon, m. m., lindorff-larsen, k., christodoulou, j., vendruscolo, m., and dobson, c. m. ( ) mapping long-range interactions in alpha-synuclein using spin-label nmr and ensemble molecular dynamics simulations. j. am. chem. soc. , - . . fernandez, c. o., hoyer, w., zweckstetter, m., jares-erijman, e. a., subramaniam, v., griesinger, c., and jovin, t. m. ( ) nmr of alpha-synuclein-polyamine complexes elucidates the mechanism and kinetics of induced aggregation. embo j. , - . . lemkau, l. r., comellas, g., lee, s. w., rikardsen, l. k., woods, w. s., george, j. m., and rienstra, c. m. ( ) site-specific perturbations of alpha-synuclein fibril structure by the parkinson's disease associated mutations a t and e k. plos one. , e . . kim, c., lv, g., lee, j. s., jung, b. c., masuda-suzukake, m., hong, c. s., valera, e., lee, h. j., paik, s. r., hasegawa, m., masliah, e., eliezer, d., and lee, s. j. ( ) exposure to bacterial endotoxin generates a distinct strain of alpha-synuclein fibril. sci. rep. , . . moussaud, s., jones, d. r., moussaud-lamodiere, e. l., delenclos, m., ross, o. a., and mclean, p. j. ( ) alpha-synuclein and tau: teammates in neurodegeneration? mol. neurodegener. , - . . fujishiro, h., tsuboi, y., lin, w. l., uchikado, h., and dickson, d. w. ( ) co- localization of tau and alpha-synuclein in the olfactory bulb in alzheimer's disease with amygdala lewy bodies. acta neuropathol. , - . . forman, m. s., schmidt, m. l., kasturi, s., perl, d. p., lee, v. m., and trojanowski, j. q. ( ) tau and alpha-synuclein pathology in amygdala of parkinsonism-dementia complex patients of guam. am. j. pathol. , - . . castillo-carranza, d. l., guerrero-munoz, m. j., sengupta, u., gerson, j. e., and kayed, r. ( ) alpha-synuclein oligomers induce a unique toxic tau strain. biol. psychiatry. , - . . ishizawa, t., mattila, p., davies, p., wang, d., and dickson, d. w. ( ) colocalization of tau and alpha-synuclein epitopes in lewy bodies. j. neuropathol. exp. neurol. , - . . gerson, j. e., farmer, k. m., henson, n., castillo-carranza, d. l., carretero murillo, m., sengupta, u., barrett, a., and kayed, r. ( ) tau oligomers mediate alpha-synuclein toxicity and can be targeted by immunotherapy. mol. neurodegener. , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . clinton, l. k., blurton-jones, m., myczek, k., trojanowski, j. q., and laferla, f. m. ( ) synergistic interactions between abeta, tau, and alpha-synuclein: acceleration of neuropathology and cognitive decline. j. neurosci. , - . . bhasne, k., sebastian, s., jain, n., and mukhopadhyay, s. ( ) synergistic amyloid switch triggered by early heterotypic oligomerization of intrinsically disordered α-synuclein and tau. j. mol. biol. , - . . lu, j., zhang, s., ma, x., jia, c., liu, z., huang, c., liu, c., and li, d. ( ) structural basis of the interplay between α-synuclein and tau in regulating pathological amyloid aggregation. j. biol. chem. , - . . puentes, l. n., lengyel-zhand, z., lee, j. y., hsieh, c., schneider, m. e., edwards, k. j., luk, k. c., lee, v. m. -., trojanowski, j. q., and mach, r. h. ( ) poly (adp-ribose) induces α-synuclein aggregation in neuronal-like cells and interacts with phosphorylated α-synuclein in post mortem pd samples. biorxiv. , . . . . . cendrowska, u., silva, p. j., ait-bouziad, n., müller, m., guven, z. p., vieweg, s., chiki, a., radamaker, l., kumar, s. t., fändrich, m., tavanti, f., menziani, m. c., alexander-katz, a., stellacci, f., and lashuel, h. a. ( ) unraveling the complexity of amyloid polymorphism using gold nanoparticles and cryo-em. proc. natl. acad. sci. u. s. a. , - . . iyer, a., roeters, s. j., kogan, v., woutersen, s., claessens, m m a e, and subramaniam, v. ( ) c-terminal truncated alpha-synuclein fibrils contain strongly twisted beta-sheets. j. am. chem. soc. , - . .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / for table of contents use only .cc-by-nc-nd . international licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / easy kinetics: a novel enzyme kinetic characterization software easy kinetics: a novel enzyme kinetic characterization software gabriele morabito , * correspondence: g.morabito@age.mpg.de department of biology, university of pisa, pisa, italy keywords: computational enzymology, enzyme’s kinetic max planck institute for biology of ageing, cologne, germany doi: . /zenodo. abstract here will be presented the software easy kinetics, a publicly available graphical interface that allows rapid evaluation of the main kinetics parameters in an enzyme catalyzed reaction. in contrast to other similar commercial software using algorithms based on non-linear regression models to reach these results, easy kinetics is based on a completely different original algorithm, requiring in input the spectrophotometric measurements of ∆abs/min taken twice at only two different substrate concentrations. the results generated show however a significant concordance with those ones obtained with the most common commercial software used for enzyme kinetics characterization, graphpad prism Ó, suggesting that easy kinetics can be used for routine tests in enzyme kinetics as an alternative valid software. introduction the continuous and rapid evolution of modern biochemical methods make the study of enzyme’s kinetic very useful both in academic research, to test how interesting polypeptidic chain’s variation impact on enzymes functionality, and in industrial processes, to optimize the production processes of the molecules of interest in enzymatic reactors [ ]. the michaelis-mentem reaction mechanism was proposed almost a century ago to describe how the reaction speed of enzymes is affected by the substrate’s concentration [ ], and it’s still the core reference model to describe enzymes kinetics. this model however requires a few parameters to fit the raw data: "#, km and vmax. several methods were developed by biochemists during years to evaluate these parameters from the raw data, the most used of which allow software like graphpad prism Ó [ ] to apply linear or non-linear regression model [ ]. original alternative methods for km and vmax determination were proposed, which graphically determine these values [ ], but like the previous ones they require multiple spectrophotometric measurements of ∆abs/min (at least conducted in duplicate) at different substrate concentrations to precisely determine the main kinetic parameters. in this paper will be presented an alternative method implemented in the software easy kinetics, which allows determination of the main kinetics parameters of an enzyme catalyzed reaction and the corresponding kinetics graphs, by the spectrophotometric measurements of ∆abs/min taken twice at only two different substrate concentrations. materials and methods algorithm used in evaluation of km and vmax: the evaluation of km and vmax by the spectrophotometric measurements of ∆abs/min taken twice at only two different substrate concentrations, is based on a trigonometric demonstration (fig. ). briefly the algorithm transforms the mean of the duplicates at the two measurements in their reciprocal values, considering the lineweaver-burk reciprocal plot. known two points of this graph, it’s universally accepted that they can be joined by one and only one straight line. this line will have an unknown inclination "a" and will intersect the cartesian axes in points %&'() and - % *' , also unknown. however by tracing the projections of the two known .cc-by . international licenseavailable under a not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprint (which wasthis version posted january , . ; https://doi.org/ . / doi: biorxiv preprint https://doi.org/ . / http://creativecommons.org/licenses/by/ . / points (x ,y ) and (x ,y ) on the cartesian plane, it is evident that the parallel lines y = y and y = intersect the studied straight line. by the alternate interior angles theorem [ ], if two parallel lines are cut by a transversal one, then the pairs of alternate interior angles are congruent: so, by fig. , "a" = "a ". considering instead the lines y = y and y = y , which are also parallel and intersected by the studied straight line, for the same theorem discussed before, their internal alternate angles are congruent: so, by fig. , "a " = "a ". this implies that: tan(/) = − % − % but also = tan(/), with = − % &'() , so: ;<= = − z = − (tan(/) ∗ ) = − ∗ ( − %) − % once calculated % &'() , the value of % *' can be determined as follow: @− a< @ = ;<= tan(/) inverting the two previous values, a<(? ∗@ [a] b? c : d< ;, : .,efg = hi ∗ jk l∗ m u = opqrgsr = tu tv d ∗ w ∗ x yz = absz^e_`ga − absbf ac . ∗ o i j_gkg_l = u yz +j _ = p.m ∗ ., x ∗ w ∗ yz y`oogjg`ajl = $%&>/ ;pqr ;s equation used for the generation of the kinetic graph equation used for the evaluation of the v at a set chosen substrate equation used to switch the previously evaluated v , expressed in ∆abs/min, into a new v value expressed in μmoli of reporter product generated per minute equation used for the evaluation of the enzymatic units in the sample equation used for the evaluation of the protein concentration during the bradford assay equation used for the evaluation of the enzyme’s specific activity equation used for the evaluation of the enzyme’s kcat equation used for the evaluation of the enzyme’s catalytic efficiency equation used for the evaluation of the hill coefficient .cc-by . international licenseavailable under a not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprint (which wasthis version posted january , . ; https://doi.org/ . / doi: biorxiv preprint https://doi.org/ . / http://creativecommons.org/licenses/by/ . / where [s] represents the substrate’s concentration; si can be , if substrate’s inhibition is present or , if substrate’s inhibition is absent; ki represents the inhibition’s constant evaluated at a very high substrate’s concentration as: +g = (>//∗ ;s)t (uii∗ bs)∗ vsqw xyz(uii∗bs){bs{ (uii∗ bs) when substrate inhibition is present +g = when substrate inhibition is absent lf represents the final volume of the sample; li represents the starting volume of the sample; ε represents the extinction molar coefficient of the product; o represents the optical path of the spectrophotometer; abshigh represents the absorbance measured at a very high substrate’s concentration; absprotein represents the absorbance of the protein’s solution; absblank represents the absorbance measured for the previous solution without proteins inside; p.m. represents the molecular weight of the reporter product. enzyme’s ∆abs/min raw data for several concentrations of tested limiting substrates: tab. experimentally measured ∆abs/min values for several substrate’s concentrations in the enzyme’s catalyzed reactions tested. .cc-by . international licenseavailable under a not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is made the copyright holder for this preprint (which wasthis version posted january , . ; https://doi.org/ . / doi: biorxiv preprint https://doi.org/ . / http://creativecommons.org/licenses/by/ . / degradation of photoreceptor outer segments by the retinal pigment epithelium requires pigment epithelium-derived factor receptor (pedf-r) degradation of photoreceptor outer segments by the retinal pigment epithelium requires pigment epithelium-derived factor receptor (pedf-r) jeanee bullock , *, federica polato *, mones abu-asab , alexandra bernardo-colón , elma aflaki , martin-paul agbaga , s. patricia becerra a section of protein structure and function-lrcmb, national eye institute, national institutes of health, bethesda, md; department of biochemistry and molecular & cellular biology, georgetown university medical center, washington d.c.; section of histopathology, national eye institute, national institutes of health, bethesda, md, departments of cell biology and ophthalmology, dean mcgee eye institute, university of oklahoma hsc, oklahoma city, ok *these authors contributed equally to this work. acorresponding author: s. patricia becerra nih-nei-lrcmb section of protein structure and function bg. , rm. center drive msc bethesda, md - becerrap@nei.nih.gov present address: jb: fort washington, md, usa; fp: washington dc, usa; ea: national institute of alcohol abuse and alcoholism, nih funding information: this work was supported by the intramural research program of the national eye institute, nihey to spb and by nih/nei r ey to mpa. word count: j. bullock, none; f. polato, none; m. abu-asab, none; a. bernardo-colón, none; e. aflaki, none; m.p. agbaga, none; s. p. becerra, none. abbreviations: amd, age-related macular degeneration; bel, bromoenol lactone; β-hb, beta hydroxybutyrate; cre, cyclization recombinase; dha, docosahexaenoic acid; loxp, locus of x-over, p ; pedf-r, pigment epithelium-derived factor receptor; pnpla , patatin-like phospholipase domain containing ; pos, photoreceptor outer segments; roi, regions of interest; rpe, retinal pigment epithelium; tem, transmission electron microscopy; wt, wild type mailto:becerrap@nei.nih.gov abstract purpose: to examine the contribution of pedf-r to the phagocytosis process. previously, we identified pedf-r, the protein encoded by the pnpla gene, as a phospholipase a in the retinal pigment epithelium (rpe). during phagocytosis, rpe cells ingest abundant phospholipids and protein in the form of photoreceptor outer segment (pos) tips, which are then hydrolyzed. the role of pedf-r in rpe phagocytosis is not known. methods: mice in which pnpla was conditionally knocked out in the rpe were generated (cko). mouse rpe/choroid explants were cultured. human arpe- cells were transfected with sipnpla silencing duplexes. pos were isolated from bovine retinas. the phospholipase a inhibitor bromoenol lactone was used. transmission electron microscopy, immunofluorescence, lipid labeling, pulse-chase experiments, western blots, and free fatty acid and β-hydroxybutyrate assays were performed. results: the rpe of the cko mice accumulated lipids as well as more abundant and larger rhodopsin particles compared to littermate controls. upon pos exposure, rpe explants from cko mice released less β-hydroxybutyrate compared to controls. after pos ingestion during phagocytosis, rhodopsin degradation was stalled both in cells treated with bromoenol lactone and in pnpla -knocked-down cells relative to their corresponding controls. phospholipase a inhibition lowered β-hydroxybutyrate release from phagocytic rpe cells. pnpla knock down also resulted in a decline in fatty acids and β-hydroxybutyrate release from phagocytic rpe cells. conclusions: pedf-r downregulation delayed pos digestion during phagocytosis. the findings imply that efficiency of rpe phagocytosis depends on pedf-r, thus identifying a novel contribution of this protein to pos degradation in the rpe. a vital function of the retinal pigment epithelium (rpe) is to phagocytose the tips of the photoreceptors in the neural retina. as one of the most active phagocytes in the body, rpe cells ingest daily a large amount of lipids and protein in the form of photoreceptor outer segments (pos) tips. – on the one hand, as outer segments are constantly being renewed at the base of photoreceptors, the ingestion of pos tips (~ % of an outer segment) by rpe cells serves to balance outer segment renewal, which is necessary for the visual activity of photoreceptors. on the other hand, the ingested pos supply an abundant source of fatty acids, which are substrates for fatty acid β-oxidation and ketogenesis to support the energy demands of the rpe. – the fatty acids liberated from phagocytosed pos are also used as essential precursors for lipid and membrane synthesis, and as bioactive mediators in cell signaling processes, e.g., the main fatty acid in pos phospholipids is docosahexaenoic acid, which is involved in signaling in the retina. rhodopsin, a pigment present in rod photoreceptors involve in visual phototransduction, is the most abundant protein in pos. approximately % of the total protein of isolated bovine pos is rhodopsin, which is embedded in a phospholipid bilayer at a molar ratio between rhodopsin and phospholipids of about : . conversely, the rpe lacks expression of the rhodopsin gene. the importance of pos clearance by the rpe in the maintenance of photoreceptors was demonstrated in an animal model for retinal degeneration, the royal college surgeons (rcs) rats, in which a genetic defect in the rcs rats renders their rpe unable to effectively phagocytose pos, thereby leading to rapid photoreceptor degeneration. , moreover, human rpe phagocytosis declines moderately with age and the decline is significant in rpe of human donors with age-related macular degeneration (amd), underscoring its importance in this disease. therefore, there is increasing interest in studying regulatory hydrolyzing enzymes involved in rpe phagocytosis for maintaining retina function and the visual process. we have previously reported that the human rpe expresses the pnpla gene, which encodes a amino acid polypeptide that exhibits phospholipase a (pla ) activity and termed pigment epithelium-derived factor receptor (pedf-r). the enzyme liberates fatty acids from phospholipids, specifically those in which dha is in the sn- position. rpe plasma membranes contain the pedf-r protein, , and photoreceptor membrane phospholipids have high content of dha in their sn- position, suggesting that upon pos ingestion the substrate lipid is available to interact with pedf-r. other laboratories used different names for the pedf- r protein (e.g., ipla ζ, desnutrin, adipose triglyceride lipase), and showed that it exhibits additional lipase activities: triglyceride lipase and acylglycerol transacylase enzymatic activities. – in macrophages, the triglyceride hydrolytic activity is critical for efficient efferocytosis of bacteria and yeast. interestingly, we and others have shown that the inhibitor of calcium-independent phospholipases a (ipla s), bromoenol lactone (bel), inhibits the phospholipase and triolein lipase activities of pedf-r/ipla ζ. , in addition, bel can impair the phagocytosis of pos by arpe- cells, associating phospholipase a activity with the regulation of photoreceptor cell renewal. however, the responsible phospholipase enzyme involved in rpe phagocytosis is not yet known. given that the role of pedf-r in rpe phagocytosis has not yet been studied, here we explored its contribution in this process. we hypothesized that pedf-r is involved in the degradation of phospholipid-rich pos in rpe phagocytosis. to test this hypothesis, we silenced the pnpla gene in vivo and in vitro. results show that with down regulation of pnpla expression and inhibition of the pla activity of pedf-r, rpe cells cannot break down rhodopsin, nor release β-hydroxybutyrate (β-hb) and fatty acids, thus identifying a novel contribution of this protein in pos degradation. we discuss the role that pedf-r may play in the disposal of lipids from ingested os, and in turn in the regulation of photoreceptor cell renewal. methods animals the generation of desnutrin floxed mice (hereafter referred to as pnpla f/f) and the tg(best - cre)jdun transgenic line (which will be named best -cre in this report) have been previously reported. the desnutrin floxed transgenic mouse model was kindly donated to our laboratory by dr. hei sook sul. the transgenic tg(best -cre)jdun mouse model was a generous gift by dr. joshua dunaief. it is an rpe-specific, cre-expressing transgenic mouse line, in which the activity of the human best promoter is restricted to the rpe and drives the rpe-specific expression of the targeted cre in the eye of transgenic mice. homozygous floxed pnpla (pnpla f/f) mice were crossed with transgenic best -cre mice. the resulting mice carrying one floxed allele and the cre transgene (pnpla f/+/cre) were crossed with pnpla f/f mice to generate mice with pnpla knockout specifically in the rpe, which are homozygous floxed mice expressing the cre transgene only in the rpe, pnpla f/f/cre (here also termed cko). pnpla f/f/cre or pnpla f/+/cre were also used for breeding with pnpla f/f to expand the colony. pnpla f/+ or pnpla f/f littermates, obtained through this breeding, were used as control mice. all procedures involving mice were conducted following protocols approved by the national eye institute animal care and use committee and in accordance with the association for research in vision and ophthalmology statement for the use of animals in ophthalmic and vision research. the mice were housed in the nei animal facility with lighting at around - lux in h ( am- pm) light/ h dark ( pm- am) cycles. dna isolation dna was isolated from mouse eyecups using the salt-chloroform dna extraction method and dissolved in µl of te (tris-edta composed of mm tris-hcl, ph , and mm edta). aliquots ( µl) of the dna solution were then used for each pcr reaction using oligonucleotide primers p and p (sequences kindly provided by the laboratory of dr. hei sook sul; table ). rna extraction, cdna synthesis, and quantitative rt-pcr rna was isolated from the mouse rpe following the methodology previously described. total rna was purified from arpe- cells using the rneasy® mini kit (qiagen, germantown, md) following the manufacturer’s instructions. between - ng of total rna were used for reverse transcription using the superscript iii first-strand synthesis system (invitrogen, carlsbad, ca). the pnpla transcript levels in arpe cells determined by quantitative rt-pcr were normalized using the quantitect sybr green pcr kit (qiagen) in the quantstudio flex real-time pcr system (thermo fisher scientific, waltham, ma). the primer sequences used in this study are listed in table . murine pnpla mrna levels relative to hprt transcript levels were measured by the quantstudio flex real-time pcr system using taqman® gene expression assays (pnpla , mm _m ; hprt, mm _m , thermo fisher scientific). pnpla relative expression to hprt was calculated using the comparative ΔΔct method. eyecup flatmounts eyecup (rpe, choroid, sclera) flatmounts were prepared and processed as follows. after enucleation, and removal of cornea, lens, and retina, eyecups were fixed for h in % paraformaldehyde at room temperature, and washed times for min each in tris-buffered saline (tbs; mm tris hcl ph . , mm nacl, . mm kcl). they were then blocked for h with % normal goat serum (ngs) in . % tbs-ta (tbs containing . % triton-x, sigma, st. louis, mo). primary antibodies against cre recombinase and rhodopsin (see table ) in . % tbs-ta containing % ngs were diluted and used at °c for h. then, the eyecups were washed times for min each with tbs-ta followed by incubation at room temperature for h with the respective secondary antibodies, using dapi (to counterstain the nuclei) and alexa fluor -phalloidin (to label the rpe cytoskeleton) diluted in . % tbs-ta containing % ngs. eyecups were then flattened by introducing incisions and mounted with prolong gold antifade reagent (thermo fisher scientific). images of the entire flatmounts were collected using the tiling feature of the epifluorescent axio imager z microscope (carl zeiss microscopy, white plains, ny) at x magnification. the collected images were stitched together using the corresponding feature of the zen blue software (carl zeiss microscopy). eyecups were also imaged using confocal microscopy (zeiss lsm ) at x magnification collecting z-stacks spanning µm from each other and covering from the basal to the apical surface of the rpe cells. the image resulting from the maximum intensity projection of the z-stacks was employed for analysis. five regions of interest (roi; µm x µm) were selected for each image of the flatmount from cko mice and control mice. the percentage of cre-positive cells was determined by dividing the number of cells containing cre-stained nuclei by the number of rpe cells in each roi (identified by f-actin staining). for phagocytosis assay, at least six roi ( . µm x . µm) were analyzed per mouse. rhodopsin-stained particles were counted using image j, after adjusting the color threshold and size of the particles to eliminate the background. transmission electron microscopy mouse eyes were enucleated and doubly-fixed in . % glutaraldehyde in pbs and . % osmium tetroxide in pbs and embedded in epoxy resin. thin sections ( nm in thickness) sections were generated and placed on -mesh copper grids, dried for h, and double-stained with uranyl acetate and lead citrate. sections were viewed and photographed with a jeol jm- transmission electron microscope. electroretinography (erg) in dim red light, overnight dark-adapted mice were anesthetized by intraperitoneal (ip) injection of ketamine ( . mg/kg) and xylazine ( . mg/kg). pupils were dilated with a mixture of % tropicamide and . % phenylephrine. a topical anesthetic, tetracaine ( . %), was administered before positioning the electrodes on the cornea for recording. erg was recorded from both eyes by the espion e system with colordome (diagnosys llc, lowell, ma, usa). dark-adapted responses were elicited with increasing light impulses with intensity from . to candela- seconds per meter squared (sc cd.s/m ). light-adapted responses were recorded after min adaptation to a rod-saturating background ( cd/m ) with light stimulus intensity from . to sc cd.s/m . during the recording, the mouse body temperature was maintained at °c by placing them on a heating pad. amplitudes for a-wave were measured from baseline to negative peak, and b-wave amplitudes were measured from a-wave trough to b-wave peak. dc erg for dc-erg, sliver chloride electrode connected to glass capillary tubes filled with hank’s buffered salt solution (hbss) were used for recording. the electrodes were kept in contact with the cornea for minutes minimum until the electrical activity reached steady-state. responses to -min stead light stimulation were recorded. cell culture human arpe- cells (atcc, manassas, va, usa, cat. # crl- ) were maintained in dulbecco’s modified eagle medium/nutrient mixture f- (dmem/f- ) (gibco; grand island, ny) supplemented in % fetal bovine serum (fbs) (gibco) and % penicillin/streptomycin (gibco) at °c with % co . for assays described below, a total of x cells in . ml were plated per well of a -well plates and incubated for days in dmem/f with % fbs and % penicillin-streptomycin. arpe- cells were authenticated by bio-synthesis (lewisville, tx) at passage . arpe- cells in passage numbers - were used for all experiments. silencing of pnpla in arpe- cells using sirna small interfering rna (sirna) oligo duplexes of bases in length for human pnpla were purchased from origene (rockville, md). their sequences, and that of a scramble sirna (scr) (cat#: sr and sr ) are given in table . from the six duplexes, sirnas c, d, and e consistently provided the highest silencing efficiency and therefore these three duplexes were used individually for silencing experiments and referred to as sipnpla . arpe- cells were transfected by reverse transfection in -well tissue culture plates as follows: a total of pmols of sirna was diluted in µl of optimem (gibco) per well, mixed with µl of lipofectamine rnaimax (invitrogen), and mock transfected cells received only µl of lipofectamine. then the mixture was added to each well. after incubation at room temperature for min, a total of x cells in µl antibiotic-free dmem/f containing % fbs was added to each well and the plate was swirled gently to mix. assays were performed h post-transfection. phagocytosis of bovine pos by arpe- cells pos were isolated as previously described from freshly obtained cow eyes (j.w. treuth & sons, catonsville, md). pos pellets were stored at - °c until use. quantification of pos units was performed using trypan blue and resulted in an average of x pos units per bovine eye. the concentration of protein from purified pos was pg/pos unit. proteins in the pos samples resolved by sds-page had the expected migration pattern for both reduced and non- reduced conditions, and the main bands stained with coomassie blue comigrated with rhodopsin-immunoreactive proteins in western blots of pos proteins (fig. s ). the percentage of rhodopsin in the protein content of pos was estimated from the gels and revealed that % or more of the protein content corresponded to rhodopsin. using electrospray ionization-mass spectrometry-mass spectrometry (esi-msms) as previously described, we determined the lipid composition of the pos that were fed to the arpe- cells. phagocytosis assays in arpe- cells were performed as follows: arpe- cells ( x cells per well) were attached to -well plates (commercial tissue culture-treated polystyrene plates, tcps, purchased from corning, corning, ny) and cultured for days to form confluent and polarized cell monolayers, as we reported previously. ringer’s solution was prepared and composed of the following: . mm nacl, . mm nahco , . mm kcl, . mm mgcl , and . mm cacl , with mm hepes dissolved separately and adjusted to ph . with n- methyl-d-glucamine. prior to use, l-carnitine was added to the ringer’s solution to achieve a mm final concentration of l-carnitine. purified pos were diluted to a concentration of x pos/ml in ringer’s solution containing freshly prepared mm glucose. a total of µl of this solution (medium) was added to each well and the cultures were incubated for min, min or . h, at °c. for pulse-chase experiments, after . h of incubation with pos (pulse), media with pos were removed from the wells and replaced with dmem/f containing % fbs and continue incubation for a total of h. the media were separated from the attached cells and stored frozen until use, and the cells were used for preparing protein extracts and either used immediately or stored frozen until used. for experiments using bel (sigma), bel dissolved in vehicle dimethyl sulfoxide (dmso) was mixed with ringer’s solution and the mixture added to the cells and incubated for h prior to starting the phagocytosis assays. the mixture was removed and replaced with the pos mixture as described above containing dmso or bel during the pulse. the assays were performed in duplicate wells per condition and each set of experiments were repeated at least two times. cell viability by crystal violet staining arpe- cells were seeded in a -well plate at a density of x cells per well. the cells were incubated at °c for d. the medium was removed and replaced with ringer’s solution containing various concentrations of bel and continued incubation at °c for . h. the medium was replaced with complete medium and the cultures incubated for a total of h. after two washes of the cells with deionized h o, the plate was inverted and tapped gently to remove excess liquid. a total of µl of a . % crystal violet (sigma) staining solution in % methanol was added to each well and incubated at room temperature for min on a bench rocker with a frequency of oscillations per min. the cells in the wells were briefly washed with deionized h o, and then the plates were inverted and placed on a paper towel to air dry without a lid for min. for crystal violet extraction, µl of methanol were added to each well and the plate covered with a lid and incubated at room temperature for min on a bench rocker set at oscillations per min. the absorbance of the plate was measured at nm. western blot arpe- cells plated in multiwell cell culture dishes were washed twice with ice-cold dpbs ( mm nacl, mm na hpo - h , . mm kh po , . mm kcl, μm mgcl - h , μm cacl , ph . ). a total of µl of cold ripa lysis and extraction buffer (thermo fisher scientific) with protease inhibitors (roche, indianapolis, in, added as per manufacturer’s instructions) was added to each well and the plate was incubated on ice for min. cell lysates were collected, sonicated for s with a % pulse (fischer scientific sonic dismembrator model , hampton, nh), and cellular debris are removed from soluble cell lysates by centrifugation at , x g at °c for min. protein concentration in the lysates was determined using the pierce™ bca protein assay kit (thermo fisher scientific) and the cell lysates were stored at - °c until use. between - µg of cell lysates were used for western blots. proteins were resolved by sds-page and transferred to nitrocellulose membranes for immunodetection. the antibodies used are listed on table . for pedf-r immunodetection, membranes were incubated in % bsa (sigma) in tbs-tb ( mm tris ph . , mm nacl containing . % tween- (sigma) at room temperature for h. then they were incubated in a solution of primary antibody against human pedf-r at : in % bsa/tbs-tb at °c for over h. membranes were washed vigorously with tbs-tb for min and incubated with anti- rabbit-hrp (kindlebio, greenwich, ct) diluted : in % bsa/tbs-tb at room temperature for min. the membranes were washed vigorously with tbs-tb for min and immunoreactive proteins were visualized using the kwikquant imaging system (kindlebio). for rhodopsin immunodetection, membranes were incubated in % dry milk (nestle, arlington, va) in pbs-t ( mm nacl, . mm kcl, mm na hpo , mm kh po , ph . , . % tween ) at room temperature for h. then, the membranes were incubated in a solution of primary antibody against human rhodopsin (novus, littleton, co) at : in a suspension of % dry milk in pbs-t at °c for over h. the membranes were washed vigorously with pbs-t for min and followed with incubation in a solution of anti-mouse-hrp (kindlebio) : in % milk in pbs-t at room temperature for min. the membranes were washed vigorously with pbs-t for min and immunoreactive proteins were visualized using the kwikquant imaging system. for protein loading control, the antibodies in membranes as processed described above were removed using restore™ western blot stripping buffer (thermo fisher scientific), sequentially followed by incubation with blocking % bsa in tbs-t at room temperature for h, a solution of primary antibody against gapdh (genetex, cat. # gtx , irvine, ca) : , in % bsa/tbs-t at °c for over h. after washing the membranes vigorously with tbs-t at room temperature for min, they were incubated in a solution of anti-mouse-hrp at : in % bsa/tbs-t at room temperature for min. after washes with tbs-t as described above, the immunoreactive proteins were visualized using the kwikquant imaging system. β-hydroxybutyrate quantification assay in mice, the assay was performed as described before. briefly, after the removal of the cornea, lens and retina, optic nerve, and extra fat and muscles, the eyecup explant from one eye was placed in a well of a -well plate containing µl ringer’s solution and the eyecup from the contralateral eye in another well with the same volume of ringer’s solution containing mm glucose and purified bovine pos ( µm phospholipid content, a kind gift from dr. kathleen boesze-battaglia). the eyecup explant cultures were then incubated for h at °c with % co and, the media were collected and used immediately or stored frozen until use. in arpe- cells, at the endpoint of the phagocytosis assay as described above, a total of µl of the culturing medium was collected and used immediately or stored at °c until use. the levels of β- hydroxybutyrate (β-hb) released from the rpe cells were determined in the collected samples using the enzymatic activity of β-hb dehydrogenase in a colorimetric assay from the stanbio beta-hydroxybutyrate liquicolor test (stanbio cat. # ; boerne, tx) with β-hb standards and following manufacturer’s instructions. free fatty acids quantification assay a total of µl of conditioned medium from arpe- cell cultures were collected and used to quantify free fatty acids using the free fatty acid quantification assay kit (colorimetric) (abcam cat. # ab ; cambridge, ma) following manufacturer’s instructions. statistical analyses data were analyzed with the two-tailed unpaired student t test or -way anova (analysis of variance), and are shown as the mean ± standard deviation (sd). p values lower than . were considered statistically significant. results generation of an rpe-specific pnpla -ko mouse to circumvent the premature lethality of pnpla -ko mice, a mouse model with rpe-specific knockout of the pnpla gene was designed. for this purpose, we crossed pnpla f/f mice with best -cre transgenic mice to obtain mice with conditional pnpla - knockout specific to the rpe, hereafter referred to as cko (or pnpla f/f/cre). in the cko mice, the promoter of the rpe- specific gene vmd (human bestrophin, here referred as best ) drive the expression of the cre (cyclization recombinase) recombinase and restrict it to the rpe. these mice carry two floxed alleles in the pnpla gene and a copy of the best -cre transgene (pnpla f/f/cre). we performed pcr reactions with primers p and p , upstream and downstream from the loxp sites flanking exon , respectively (fig. a), with dna extracted from cko eyecups and found that the amplimers had the expected length of bp corresponding to the recombined (cko) allele (fig. b), thus showing that the cre-loxp recombination occurred successfully and led to the deletion of the floxed region (exon ) in the rpe of cko mice (or pnpla f/f/cre). conversely, we observed two pcr bands of bp and bp for littermate pnpla f/+ control mice carrying a wt and a floxed allele, respectively (the floxed allele contains two loxp sites) (fig b). in lanes for the cko (or pnpla f/f/cre), we also observed very low intensity bands migrating at positions corresponding to bp and bp, which probably resulted from a few unsuccessful recombination events. reverse transcriptase pcr (rt-pcr) revealed pnpla transcript levels in the rpe that were lower from cko mice than from control (with a mean that was about % of the control mice) (fig. c). we determined the percentage of rpe cells that produced the cre protein by immunofluorescence of rpe whole flatmounts. cells were visualized by co-staining with fluorescein-labelled phalloidin antibody to detect the actin cytoskeleton. we observed cre- immunoreactivity in the rpe flatmounts isolated from cko mice, while no cre-labeling was detected in the controls (fig. d). the overall distribution was patchy and mosaic, as previously described for the best -cre mice. the percentage of cre-positive cells in roi (regions of interest) of flatmounts showed nine mice with expected percentages of cre-positive cells in rpe and one with low cre-positivity (fig. e). the average of the mean values of cre-positive cells for each cko mouse (mouse numbers , , - ) was % (ranging between %- %), which was within the expected for cre positivity in the rpe of the best -cre mouse. cre-positive cells were not detected in rpe of control animals (fig. d-e). unfortunately, further protein analysis of pedf-r in mouse retinas was not conclusive because several commercial antibodies to pedf-r gave high background by immunofluorescence and in western blots. nevertheless, the results demonstrate the successful generation of rpe-specific pnpla -knock-down mice. lipid accumulates in the rpe of pnpla -cko mice we examined the ultrastructure of the rpe by tem imaging. accumulation of large lipid droplets (lds) was observed in cko mice as early as months of age compared to the control mice cohort (fig. a), and lds were still observed in the rpe of -month old pnpla -cko compared to controls (fig. b). the presence of lds was associated with either the lack (normally seen in the basal side) (fig. s a, s h) or the decreased thickness of the basal infoldings, and with granular cytoplasm, abnormal mitochondria (fig. s b), and disorganized localization of organelles (mitochondria and melanosomes) (fig. s a). in some cells, lds crowded the cytoplasm and clustered together the mitochondria and melanosomes into the apical region of the cells (figs. s a, s c, s d); however, the number and expansion of lds within the cells appeared to be random (fig. s e). normal apical cytoplasmic processes were lacking; and degeneration in the outer segment (os) tips of the photoreceptors was apparent (figs. s a, s f). additionally, normal phagocytosis of the os by rpe cells was not evident, implying certain degree of impairment (figs. s a, s e, s g). there were apparent unhealthy nuclei with pyknotic chromatin and leakage of extranuclear dna (endna), indicating the beginning of a necrotic process (fig. s b). some rpe cells had lighter low-density cytoplasm indicating degeneration of cytoplasmic components in contrast to the denser and fuller cytoplasm in the rpe of the littermate controls (fig. s i, s j). thus, these observations imply that pnpla down regulation caused lipid accumulation in the rpe. pnpla deficiency increases rhodopsin levels in the rpe of mice because the rpe does not express the rhodopsin gene, the level of rhodopsin protein in the rpe cells is directly proportional to their phagocytic activity. , to investigate how the knock down of pnpla affects rpe phagocytic activity in mice, we compared the rhodopsin-labeled particles present in the eyecup of cko mice and those of control mice at -h and -h post-light onset in vivo. the rois for the mutant mice were selected from areas rich in cre-positive cells. phalloidin labeled flatmounts of control mice (n= ) showed that the rpe cells had the typical cobblestone morphology, while nine out of ten cko mice had distorted cell morphology. rhodopsin was detected in all rois and the labeled particles were more intense and larger in size in the majority of cko flatmounts compared to those in the control mice. representative rois are shown in figure a. the observations implied that pnpla knock down in the rpe prevented rhodopsin degradation in vivo. ketogenesis upon rpe phagocytosis in explants from cko mice is impaired given that rpe phagocytosis is linked to ketogenesis, we also measured the levels of ketone body β-hb released by rpe/choroid explants of the cko mice ex vivo and compared them with those of control littermates. the experiments were performed at -h ( am) and -h ( pm) post-light onset, a time of day in which the amount of β-hb released due to endogenous phagocytosis is not expected to vary with time. a phagocytic challenge by exposure to exogenous bovine os increased the amount of β-hb released by explants from both cko and control littermates compared to the β-hb released under basal condition (without addition of exogenous os) (fig. b). the os-mediated increase in β-hb release above basal levels of the cko rpe/choroid explants ( . nmols at am, . nmols at pm) was lower than the one of the control explants ( nmols at am and . nmols at pm) (fig. c). these observations reveal a deficiency in β-hb production by the rpe/choroid explants of cko mice under phagocytic challenge ex vivo. electroretinography of the cko mouse to examine the functionality of the retina and rpe of cko mice, we performed erg and dc- erg. figure shows histograms that revealed no differences among the animals, implying that the functionality was not affected in the rpe-pnpla -cko mice. phagocytic arpe- cells engulf and break down pos protein and lipid the complexity of the interactions that occur in the native retina makes it difficult to evaluate the subcellular and biochemical changes involved in phagocytosis of pos. cultured rpe cells provide an ideal alternative to perform these studies. accordingly, we designed and validated an assay with a human rpe cell line, arpe- , to which we added pos isolated from bovine retinas, as described in methods. the lipid composition of the pos fed to the arpe- cells included phosphatidylcholine (pc) containing very long chain polyunsaturated fatty acids (vlc- pufas) that was ~ relative mole percent of total pc species in the pos. the other major pc species include pc : , pc : , and pc : , comprising ~ relative mole percent of the total pc phospholipids. the most abundant phosphatidylethanolamine (pe) species in the pos were pe : , pe : , and pe : that accounts for about relative mole percent of the total pe phospholipids. the confluent monolayer of cells was exposed to the purified pos membranes for up to . h and then the ingested pos were chased for h for pulse-chase experiments. the fate of rhodopsin, the main protein in pos, was followed by western blotting of cell lysates. rhodopsin was detected in the cell lysates as early as min and its levels increased at h and . h during the pos pulse, and decreased with a h chase (fig. s a). quantification revealed that rhodopsin levels were % of those detected after . h of pos supplementation (fig. s b). free fatty acid and β-hb levels were also determined in the culture media during the pulse. the levels of free fatty acids in the medium of pos-challenged arpe- cells were -, -, and -fold higher at min, min and . h of incubation, respectively, relative to those in the medium of cells not exposed to pos (fig. s c). the β-hb levels released into the medium after pos addition also increased by -, . - and -fold after min, min and . h incubations, respectively, relative to those observed in the medium of cells not exposed to pos (fig. s d). altogether, these results show that under the specified conditions in this study, the batch of arpe- cells phagocytosed, i.e., engulfed and digested bovine pos protein and lipid components. bromoenol lactone blocks the degradation of pos components in phagocytic arpe- cells we investigated the role of pedf-r pla activity in rpe phagocytosis. as we have previously described, a calcium-independent phospholipase a inhibitor, bromoenol lactone (bel), inhibits pedf-r pla enzymatic activity. first, we determined the concentrations of bel that would maintain viability of arpe- cells. figure a shows the concentration response curve of bel on arpe- cell viability. the bel concentration range tested was between . and μm and the hill plot estimated an ic (concentration that would lower cell viability by %) of . μm bel. therefore, to determine the effects of bel on the arpe- phagocytic activity, cultured cells were preincubated with the inhibitor at concentrations below the ic for cell viability prior to pulse-chase assays designed as described above. pretreatment with dmso alone without bel was assayed as a control. interestingly, the inhibitor at μm and μm blocked more than % of the degradation of rhodopsin during pos chase for h in arpe- cells (figs. b- c). similar blocking effects of bel ( µm) were observed with time up to h during the chase (figs. d- e). the inhibitor did not appear to affect rhodopsin ingestion. the rhodopsin levels in pulse-chase assays with cells pretreated with dmso alone were like those without pretreatment (compare figs. b and s a). the cells observed under the microscope after the chase point and prior to the preparation of cell lysates had similar morphology and density among cultures with and without pos, and cultures before and after pulse. moreover, bel blocked % of the β-hb releasing activity of arpe- cells, whereas dmso alone did not affect the activity (fig. f). these observations demonstrate that while binding and engulfment were not affected by bel under the conditions tested, phospholipase a activity was required for rhodopsin degradation and β-hb release by arpe- cells during phagocytosis. pnpla down regulation in arpe- cells impairs pos degradation we also silenced pnpla expression in arpe- cells to investigate the possible requirement of pedf-r for phagocytosis. first, we tested the silencing efficiency of six different sirnas designed to target pnpla , along with a scrambled sirna sequence (scr) as negative control (see sequences in table ). the sirna-mediated knockdown of pnpla resulted in significant decreases in the levels of pnpla transcripts (sirna a, c, d and e, figs. a and s ) with a concomitant decline in pedf-r protein levels (sirna c, d and e, fig. d) in arpe- cell extracts. the sirnas with the highest efficiency of silencing pnpla mrna (namely c, d, and e) were individually used for subsequent experiments, and denoted as sipnpla (fig. a). a time course of sipnpla transfection revealed that the gene was silenced as early as h and throughout h post-transfection and parallel to pulse-chase ( . h, figs. b, s ). there was no significant difference between mock transfected cells and cells transfected with scr (fig. c). examining the cell morphology under the microscope, we did not notice differences between the scrambled and sipnpla -transfected cells. western blots showed that protein levels of pedf-r in arpe- membrane extracts declined h post- transfection (fig. d). thus, subsequent experiments with cells in which pnpla was silenced were performed h after transfection. second, we tested the effects of pnpla silencing on arpe- cell phagocytosis. here we monitored the outcome of rhodopsin in pulse-chase experiments. interestingly, while pnpla knock down did not affect ingestion, the sipnpla -transfected cells failed to degrade the ingested pos rhodopsin ( % and % remaining at h and at h, respectively), while scr- transfected cells were more efficient in degrading them ( % and % remaining at and h respectively) (figs. a- b). third, we also determined the levels of secreted free fatty acids and β-hb production in pnpla silenced cells at . h, h, and . h following pos addition. free fatty acid levels in the culture medium were lower in sipnpla -transfected cells than in cells transfected with scr at min post-addition of pos, and no difference was observed between sipnpla and scr at h and . h post-addition (fig. c). secreted β-hb levels in the culture medium were lower in sipnpla cells than in scr-transfected cells at all time points (fig. d). to determine the effect of pnpla knockdown on lipid and fatty acid levels in the arpe- cells fed pos membranes, we used electron spray ionization-mass spectrometry (esi/ms/ms) and gas chromatography-flame ion detection to identify and quantify total lipids and fatty acid composition of the arpe- cells at . and h post pos feeding. our results did not show any significant differences in the intracellular lipid and fatty acid levels in the sipnpla knockdown in scr and wt control cells at both . and h after pos addition (data not shown). taken together, these results demonstrate that digestion of pos protein and lipid components was impaired in pnpla silenced arpe- cells undergoing phagocytosis. discussion here, we report that pedf-r is required for efficient degradation of pos by rpe cells after engulfment during phagocytosis. this conclusion is supported by the observed decrease in rhodopsin degradation, in fatty acid release and in β-hb production upon pos challenge when the pnpla gene is downregulated or the pedf-r lipase is inhibited. these observations occur in rpe cells in vivo, ex vivo and in vitro. the findings imply that rpe phagocytosis depends on pedf-r for the release of fatty acids from pos phospholipids to facilitate pos protein hydrolysis, thus identifying a novel contribution of this enzyme in pos degradation and, in turn, in the regulation of photoreceptor cell renewal. this is the first time that the pnpla gene has been studied in the context of rpe phagocytosis of pos. previously, we investigated its gene product, termed pedf-r, as a phospholipase-linked cell membrane receptor for pigment epithelium-derived factor (pedf), a retinoprotective factor encoded by the serpinf gene and produced by rpe cells. , , , like rpe cells, non- inflammatory macrophages are phagocytic cells, but unlike rpe cells, they are found in all tissues, where they engulf and digest cellular debris, foreign substances, bacteria, other microbes, etc. , the kratky laboratory reported data on the effects of pnpla silencing in efferocytosis obtained using pnpla -deficient mice (termed atgl-/- mouse), and demonstrated that their macrophages have lower triglyceride hydrolase activity, higher triglyceride content, lipid droplet accumulation, and impaired phagocytosis of bacterial and yeast particles, and that in these cells, intracellular lipid accumulation triggers apoptotic responses and mitochondrial dysfunction. we have shown that pnpla gene knockdown causes rpe cells to be more responsive to oxidative stress-induced death. pnpla gene silencing, pedf-r peptides blocking ligand binding, and enzyme inhibitors abolish the activation of mitochondrial survival pathways by pedf in photoreceptors and other retinal cells. , , consistently, overexpression of the pnpla gene or exogenous additions of a pedf-r peptide decreases both the death of rpe cells undergoing oxidative stress and the accumulation of biologically detrimental leukotriene ltb levels. the fact that pedf is a ligand that enhances pedf-r enzymatic activity, suggests that exposure of rpe to this factor is likely to enhance phagocytosis. these implications are unknown and need further study. exogenous additions of recombinant pedf protein to arpe- cells undergoing phagocytosis did not provide evidence for such enhancement (jb personal observations). this suggests that heterologous serpinf overexpression in cells and/or an animal model of inducible knock-in of serpinf may be useful to focus on the role of pedf/pedf-r in rpe phagocytosis unbiased by the endogenous presence of pedf. to investigate the consequences of pnpla silencing in pos phagocytosis, we generated a mouse model with a targeted deletion of pnpla in rpe cells in combination with the best-cre system for its exclusive conditional silencing in rpe cells (cko mouse). these mice are viable with no apparent changes in other organs and in weight compared with control littermates and wild type mice. the cko mice live to an advanced age, in contrast to the constitutively silenced pnpla -ko mice in which the lack of the gene causes premature lethality ( - weeks) due to heart failure associated with massive accumulation of lipids in cardiomyocytes. the rpe cells of the cko mouse have large lipid droplets at early and late age (figs. a, s ) consistent with a buildup of substrates for the lipase activities of the missing enzyme. in cko mice, lipid accumulation associates with lack of or the decreased thickness of the basal infoldings, granular cytoplasm, abnormal mitochondria and disorganized localization of organelles (mitochondria and melanosomes) in some rpe cells (fig. s ). taken together, the tem observations in combination with the greater rhodopsin accumulation and decline in β-hb release in cko mice support that pedf-r is required for lipid metabolism and phagocytosis in the rpe. however, interestingly, the observed features do not seem to affect photoreceptor functionality (fig. s ) and appear to be inconsequential to age-related retinopathies in the pnpla -cko mouse. this unanticipated observation suggests that the remaining rpe cells expressing pnpla gene probably complement activities of those lacking the gene, thereby lessening photoreceptor degeneration and dysfunction in the cko mouse. we note that the cko mouse has a mosaic expression pattern with non-cre-expressing rpe cells, as shown before for the best -cre transgenic line. at the same time, the erg measurements performed correspond to global responses of the photoreceptors and rpe cells, thereby missing individual cell evaluation. the lack of photoreceptor dysfunction with rpe lipid accumulation due to pnpla down regulation also suggests that during development a compensatory mechanism independent of pnpla /pedf-r is likely to be activated, thereby minimizing retinal degeneration in the cko mouse. further study will be required to understand the implications of these unexpected findings. animal models of constitutive heterozygous knockout or inducible knockdown of pnpla may be instrumental to address the role of pnpla /pedf-r in mature photoreceptors unbiased by compensatory mechanisms due to low silencing efficiency or during development. results obtained from experiments using rpe cell cultures further establish that pedf-r deficiency affects phagocytosis. it is worth mentioning that the data obtained under our experimental conditions were essentially identical to those typically obtained in assays performed with cells attached to porous permeable membranes, and this provides an additional advantage to the field by requiring shorter time to complete (see fig. s ). on one hand, the decrease in the levels of β-hb and in the release of fatty acids (the breakdown products of phospholipids and triglycerides) upon pos ingestion by cells pretreated with bel as well as transfected with sipnpla relative to the control cells indicates that pnpla participates in rpe lipid metabolism. on the other hand, the fact that pedf-r inhibition and pnpla down regulation impair rhodopsin break down from ingested pos in rpe cells implies a likely dependence of pedf-r-mediated phospholipid hydrolysis for pos protein proteolysis. in this regard, we envision that proteins in pos are mainly resistant to proteolytic hydrolysis, because the surrounded phospholipids block their access to proteases for cleavage. phospholipase a activity would hydrolyze these phospholipids to likely liberating the proteins from the phospholipid membranes and become available to proteases, such as cathepsin d, an aspartic protease responsible for % of rhodopsin degradation. it is important to note that the findings cannot discern whether pedf-r is directly associated to the molecular pathway of rhodopsin degradation, or indirectly involved in downregulating cathepsin d or other proteases. it is also possible that pnpla deficiency results in the alteration of critical genes regulating the phagocytosis pathway, such as lc and genes of the mtor pathway. animal models deficient in such genes display retinal phenotypes such as impaired phagocytosis and lipid accumulation, similar to those observed in pedf-r deficient cells. – these implications need further exploration. given that bel is an irreversible inhibitor of ipla it has been used to discern the involvement of ipla in biological processes. previously, we demonstrated that bel at to µm blocks – % of the pla activity of human recombinant pedf-r. jenkins et al showed that µm bel inhibits > % of the triolein lipase activity of human recombinant pedf-r (termed by this group as ipla ζ). in cell-based assays, wagner et al showed bel at µm inhibits % of this enzyme’s triglyceride lipase activity in hepatic cells. in the present study, to minimize cytotoxicity and ensure inhibition of the ipla activity of pedf-r in arpe- cells, we selected µm and µm bel concentrations that are below the ic determined for arpe- cell viability ( . µm bel; fig. a). we note that these bel concentrations are within the range used in an earlier study on arpe- cell phagocytosis. we compared our results to those by kolko et al regarding bel effects on phagocytosis of arpe- cells. using alexa-red labeled-pos, they reported the percent of phagocytosis inhibition caused by – µm bel as % in arpe- cells. however, the authors did not specify the time of incubation for this experiment and, based on the other experiments in the report, the time period may have lasted at least h of pulse, implying inhibition of ingestion of pos, and lacking description of the effects of bel on pos degradation. with unmodified pos in pulse-chase assays, our findings show a percent of inhibition after chase of > % for µm and µm bel, indicating more effective inhibition of pos digestion. the effect of bel on pos ingestion under . h was insignificant and over . h remains unknown (pulse). in addition, we show that pretreatment with bel results in a decrease in the release of β-hb, which is produced from the oxidation of fatty acids liberated from pos. thus, our assay provides new information -e.g., pulse-chase, use of unmodified pos, β-hb release- to those reported by kolko et al. it is concluded that bel can impair phagocytic processes in arpe- cells. while bel is recognized as a potent inhibitor of ipla , it can also inhibit non-pla enzymes, such as magnesium-dependent phosphatidate phosphohydrolase and chymotrypsin. , consequently , a complementary genetic approach targeting pedf-r is deemed reasonable and appropriate to investigate its role in rpe phagocytosis. the complex and highly regulated phagocytic function of the rpe also serves to protect the retina against lipotoxicity. by engulfing lipid-rich pos and using ingested fatty acids for energy, the rpe prevents the accumulation of lipids in the retina, particularly phospholipids, which could trigger cytotoxicity when peroxidized. , in this regard, the lack of observed differences in intracellular phospholipid and fatty acids between pedf-r-deficient rpe and control cells lead us to speculate that in arpe- cells exposed to pos the undigested lipids remain within the cells and contribute to the total lipid and fatty acid pool, some of which may be converted to other lipid byproducts to protect against lipotoxicity. also, the duration of the in vitro chase is shorter than what pertains in vivo, where undigested pos accumulate and overtime coalesce to form the large lipid droplets observed in the rpe in vivo. thus, future experiments aimed at detailed time-dependent characterization of specific lipid species and free fatty acid levels in the rpe in vivo, and in media and cells in vitro will allow us to have a better understanding of classes of lipids and fatty acids that contribute to the lipid droplet accumulation in the rpe in vivo due to pnpla deletion. nonetheless, a role of pedf-r in pos degradation agrees with the previously reported involvement of a phospholipase a activity in the rpe phagocytosis of pos , and with the role of providing protection of photoreceptors against lipotoxicity. in conclusion, this is the first study to identify a role for pedf-r in rpe phagocytosis. the findings imply that efficient rpe phagocytosis of pos requires pedf-r, thus highlighting a novel contribution of this protein in pos degradation and its consequences in the regulation of photoreceptor cell renewal. acknowledgements this work was supported by the intramural research program of the national eye institute, nih (project #ey ) to spb and by nih/nei r ey to mpa. we thank the nei animal house, histopathology, visual function, genetic engineering and biological imaging core facilities for technical support. we thank dr. hei sook sul, university of california, berkeley, for kindly providing sequences for primers of pnpla and the desnutrin flox mouse; dr. joshua dunaief, university of pennsylvania for kindly providing the transgenic tg(best - cre)jdun mouse model; dr. kathleen boesze-battaglia’s laboratory for kindly providing pos; drs. eugenia poliakov and sheetal uppal for help in isolating pos; dr. kiyoharu j miyagishima for performing the dcerg experiments; dr. preeti subramanian for technical assistance with cell culture and microscopy; and dr. ivan rebustini for proofreading the manuscript and providing feedback and reagents for rt-pcr. table . primers used for qrt-pcr gene (human) forward primer reverse primer pnpla ’-agctcatccaggccaatgtct- ’ ’-tgtctgaaatgccaccatcca- ’ s ’-ggttgatcctgccagtag- ’ ’-gcgaccaaaggaaccataac- ’ p and p ’-gcttcaaacagcttcctcatg- ’ ’-ggactttcggtcatagttccg- ’ table . antibodies used in the study antibody type & host application dilution company catalog number gapdh monoclonal mouse wb : , genetex gtx pedf-r polyclonal rabbit wb if : : protein tech - -ap rhodopsin (a ) monoclonal mouse wb if : : novus biologicals nbp - rhodopsin (b ) monoclonal mouse if : novus biologicals nbp - cre recombinase monoclonal rabbit if : cell signaling technology alexa fluor goat anti-mouse igg (h+l) if : thermofisher scientific a- alexa fluor goat anti-rabbit igg (h+l) if : thermofisher scientific a- alexa fluor - phalloidin if : cell signaling technology table . sirna duplex sequences sirna duplex identifier duplex sequences sr a a rcrgrcrcrararargrcrarcrarurgrurararurarararurgct sr b b rgrgrcrarcrarurarurargrararcrgrurarcrurgrcrarurucc sr c c rgrcrcrurgrargrarcrgrcrcrurcrcrarururarcrcrarctg sr a d rcrcrarargrururcrarururgrargrgrurarurcrurararaga sr b e rcrurgrcrcrarcrurcrurarurgrargrcrururarargraraca sr c f rcrururgrgrurarararurarararararcrgrararararurgtt references . goldman ai, teirstein ps, o’brien pj. the role of ambient lighting in circadian disc shedding in the rod outer segment of the rat retina. investigative ophthalmology & visual science. ; ( ): - . . lavail mm. circadian nature of rod outer segment disc shedding in the rat. investigative ophthalmology & visual science. ; ( ): - . . strauss o. the retinal pigment epithelium. physio rev. ; ( ): - . . kevany bm, palczewski k. phagocytosis of retinal rod and cone photoreceptors. physiology. ; ( ): - . doi: . /physiol. . . mazzoni f, safa h, finnemann sc. understanding photoreceptor outer segment phagocytosis: use and utility of rpe cells in culture. exp eye res. ; : - . doi: . /j.exer. . . . fliesler aj, anderson re. chemistry and metabolism of lipids in the vertebrate retina. progress in lipid research. ; ( ): - . doi: . / - ( ) - . chen h, anderson re. differential incorporation of docosahexaenoic and arachidonic acids in frog retinal pigment epithelium. journal of lipid research. ; ( ): - . . reyes-reveles j, dhingra a, alexander d, bragin a, philp nj, boesze-battaglia k. phagocytosis-dependent ketogenesis in retinal pigment epithelium. j biol chem. ; ( ): - . doi: . /jbc.m . . sangiovanni jp, chew ey. the role of omega- long-chain polyunsaturated fatty acids in health and disease of the retina. progress in retinal and eye research. ; ( ): - . doi: . /j.preteyeres. . . . obin ms, jahngen-hodge j, nowell t, taylor a. ubiquitinylation and ubiquitin-dependent proteolysis in vertebrate photoreceptors (rod outer segments): evidence for ubiquitinylation of gt and rhodopsin. journal of biological chemistry. ; ( ): - . doi: . /jbc. . . . palczewski k. g protein-coupled receptor rhodopsin. annu rev biochem. ; : - . doi: . /annurev.biochem. . . . strauss o, stumpff f, mergler s, wienrich m, wiederholt m. the royal college of surgeons rat: an animal model for inherited retinal degeneration with a still unknown genetic defect. cells tissues organs. ; ( - ): - . doi: . / . d’cruz pm, yasumura d, weir j, et al. mutation of the receptor tyrosine kinase gene mertk in the retinal dystrophic rcs rat. human molecular genetics. ; ( ): - . doi: . /hmg/ . . . inana g, murat c, an w, yao x, harris ir, cao j. rpe phagocytic function declines in age- related macular degeneration and is rescued by human umbilical tissue derived cells. j transl med. ; ( ): - . doi: . /s - - - . notari l, baladron v, aroca-aguilar jd, et al. identification of a lipase-linked cell membrane receptor for pigment epithelium-derived factor. journal of biological chemistry. ; ( ): - . doi: . /jbc.m . pham tl, he j, kakazu ah, jun b, bazan ng, bazan hep. defining a mechanistic link between pigment epithelium–derived factor, docosahexaenoic acid, and corneal nerve regeneration. journal of biological chemistry. ; ( ): - . doi: . /jbc.m . . subramanian p, locatelli-hoops s, kenealey j, desjardin j, notari l, becerra sp. pigment epithelium-derived factor (pedf) prevents retinal cell death via pedf receptor (pedf-r): identification of a functional ligand binding site. j biol chem. ; ( ): - . doi: . /jbc.m . . jenkins cm, mancuso dj, yan w, sims hf, gibson b, gross rw. identification, cloning, expression, and purification of three novel human calcium-independent phospholipase a family members possessing triacylglycerol lipase and acylglycerol transacylase activities. journal of biological chemistry. ; ( ): - . doi: . /jbc.m . villena ja, roy s, sarkadi-nagy e, kim k-h, sul hs. desnutrin, an adipocyte gene encoding a novel patatin domain-containing protein, is induced by fasting and glucocorticoids: ectopic expression of desnutrin increases triglyceride hydrolysis. journal of biological chemistry. ; ( ): - . doi: . /jbc.m . zimmermann r, strauss jg, haemmerle g, et al. fat mobilization in adipose tissue is promoted by adipose triglyceride lipase. science. ; ( ): . doi: . /science. . chandak pg, radovic b, aflaki e, et al. efficient phagocytosis requires triacylglycerol hydrolysis by adipose triglyceride lipase. j biol chem. ; ( ): - . doi: . /jbc.m . . kolko m, wang j, zhan c, et al. identification of intracellular phospholipases a in the human eye: involvement in phagocytosis of photoreceptor outer segments. investigative ophthalmology & visual science. ; ( ): - . doi: . /iovs. - . ahmadian m, abbott mj, tang t, et al. desnutrin/atgl is regulated by ampk and is required for a brown adipose phenotype. cell metab. ; ( ): - . doi: . /j.cmet. . . . iacovelli j, zhao c, wolkow n, et al. generation of cre transgenic mice with postnatal rpe- specific ocular expression. invest ophthalmol vis sci. ; ( ): - . doi: . /iovs. - . müllenbach r, lagoda p, welter c. an efficient salt-chloroform extraction of dna from blood and tissues. trends in genetics : tig. ; ( ): . . xin-zhao wang c, zhang k, aredo b, lu h, ufret-vincenty rl. novel method for the rapid isolation of rpe cells specifically for rna extraction and analysis. exp eye res. ; : - . doi: . /j.exer. . . . livak kj, schmittgen td. analysis of relative gene expression data using real-time quantitative pcr and the −ΔΔct method. methods. ; ( ): - . doi: . /meth. . . schertler gfx, hargrave pa. [ ] preparation and analysis of two-dimensional crystals of rhodopsin. in: methods in enzymology. vol . academic press; : - . doi: . /s - ( ) - . agbaga m-p, stiles ma, brush rs, et al. the elovl spinocerebellar ataxia- mutation t>g (p.w g) impairs retinal function in the absence of photoreceptor degeneration. molecular neurobiology. published online august , . doi: . /s - - - . lerman mj, lembong j, muramoto s, gillen g, fisher jp. the evolution of polystyrene as a cell culture material. tissue engineering part b: reviews. ; ( ): - . doi: . /ten.teb. . . subramanian p, mendez ef, becerra sp. a novel inhibitor of -lipoxygenase ( -lox) prevents oxidative stress-induced cell death of retinal pigment epithelium (rpe) cells. invest ophthalmol vis sci. ; ( ): - . doi: . /iovs. - . haemmerle g, lass a, zimmermann r, et al. defective lipolysis and altered energy metabolism in mice lacking adipose triglyceride lipase. science. ; ( ): . doi: . /science. . lavail mm. rod outer segment disc shedding in relation to cyclic lighting. experimental eye research. ; ( , part ): - . doi: . / - ( ) - . comitato a, subramanian p, turchiano g, montanari m, becerra sp, marigo v. pigment epithelium-derived factor hinders photoreceptor cell death by reducing intracellular calcium in the degenerating retina. cell death dis. ; ( ): - . doi: . /s - - -y . hernández-pinto a, polato f, subramanian p, et al. pedf peptides promote photoreceptor survival in rd retina models. experimental eye research. ; : - . doi: . /j.exer. . . . mayerson pl, hall mo. rat retinal pigment epithelial cells show specificity of phagocytosis in vitro. j cell biol. ; ( ): - . doi: . /jcb. . . . finnemann sc, rodriguez-boulan e. macrophage and retinal pigment epithelium phagocytosis: apoptotic cells and photoreceptors compete for alphavbeta and alphavbeta integrins, and protein kinase c regulates alphavbeta binding and cytoskeletal linkage. j exp med. ; ( ): - . doi: . /jem. . . . aflaki e, radovic b, chandak pg, et al. triacylglycerol accumulation activates the mitochondrial apoptosis pathway in macrophages. j biol chem. ; ( ): - . doi: . /jbc.m . . subramanian p, becerra sp. role of the pnpla gene in the regulation of oxidative stress damage of rpe. in: bowes rickman c, grimm c, anderson re, ash jd, lavail mm, hollyfield jg, eds. retinal degenerative diseases. springer international publishing; : - . . kenealey j, subramanian p, comitato a, et al. small retinoprotective peptides reveal a receptor-binding region on pigment epithelium-derived factor. j biol chem. ; ( ): - . doi: . /jbc.m . . rakoczy pe, baines m, kennedy cj, constable ij. correlation between autofluorescent debris accumulation and the presence of partially processed forms of cathepsin d in cultured retinal pigment epithelial cells challenged with rod outer segments. experimental eye research. ; ( ): - . doi: . /exer. . . dhingra a, bell ba, peachey ns, et al. microtubule-associated protein light chain b, (lc b) is necessary to maintain lipid-mediated homeostasis in the retinal pigment epithelium. front cell neurosci. ; : - . doi: . /fncel. . . cheng s-y, cipi j, ma s, et al. altered photoreceptor metabolism in mouse causes late stage age-related macular degeneration-like pathologies. proc natl acad sci u s a. ; ( ): - . doi: . /pnas. . go y-m, zhang j, fernandes j, et al. mtor-initiated metabolic switch and degeneration in the retinal pigment epithelium. the faseb journal. ; ( ): - . doi: . /fj. r . wagner c, hois v, pajed l, et al. lysosomal acid lipase is the major acid retinyl ester hydrolase in cultured human hepatic stellate cells but not essential for retinyl ester degradation. biochim biophys acta mol cell biol lipids. ; ( ): - . doi: . /j.bbalip. . . balsinde j, dennis ea. bromoenol lactone inhibits magnesium-dependent phosphatidate phosphohydrolase and blocks triacylglycerol biosynthesis in mouse p d macrophages. journal of biological chemistry. ; ( ): - . doi: . /jbc. . . . jenkins cm, han x, mancuso dj, gross rw. identification of calcium-independent phospholipase a (ipla ) β, and not ipla γ, as the mediator of arginine vasopressin- induced arachidonic acid release in a- smooth muscle cells: enantioselective mechanism-based discrimination of mammalian ipla s. journal of biological chemistry. ; ( ): - . doi: . /jbc.m . ueta t, inoue t, furukawa t, et al. glutathione peroxidase is required for maturation of photoreceptor cells. j biol chem. ; ( ): - . doi: . /jbc.m . . imai h, matsuoka m, kumagai t, sakamoto t, koumura t. lipid peroxidation-dependent cell death regulated by gpx and ferroptosis. in: nagata s, nakano h, eds. apoptotic and non-apoptotic cell death. springer international publishing; : - . doi: . / _ _ figure legends figure . generation of rpe-specific pnpla -cko mice. (a) scheme of pnpla floxed and cre- mediated recombined allele. the loxp sites flank exon . p and p are the primers homologous to sequences outside the floxed (flanked by the loxp sites) region used to detect cre-mediated recombination (generating recombined alleles) on genomic dna. the sizes of the amplicons obtained by pcr using p and p are indicated. (b) gel electrophoresis of pcr reaction products obtained using primers p and p and genomic dna isolated from mouse eyecups from either cko or control (ctr) mice (pnpla f/+); lane (mw) corresponds to molecular weight markers (generuler dna ladder mix). one eyecup per lane from a -month old mouse, n= cko, n= ctr. (c) pnpla expression (vs. hprt) in rpe from month-old cko (pnpla f/f/cre) relative to control littermates (pnpla f/f). each data point corresponds to the average of six pcr reactions per eyecup, six eyes from three cko mice and six eyes from three control mice at – months old. (d) cre (red) and phalloidin (yellow) labeling of rpe/choroid flatmounts from control (pnpla f/f) (left) and littermate cko (pnpla f/f/cre) (right). the scale corresponds to µm. (n= images from individual mouse eyecup at - months old). (e) plot of percentage of cre-positive rpe cells in cko animals (pnpla f/f/cre, n= , age was . - . months old) as indicated in x-axis. each data point corresponds to percentage of cre-positive rpe cells from an roi, each bar corresponds to a flatmount of an individual cko mouse, and the bar for control (pnpla f/f) has data from mice. figure . lipid accumulation in the rpe of pnpla -cko mice. electron microscopy micrographs showing the rpe structure of - (a) and (b) month-old cko mice and control animals. ld: lipid droplets; bi: basal infoldings. scale bar corresponds to µm. the representative images were selected among examinations of micrographs from eyes of cko (pnpla f/f cre+) mice, from eyes of control (pnpla f/f) mice at . - . -month-old; and from eyes of cko mice and eyes of control mice at . - -month-old. figure . phagocytosis and β-hydroxybutyrate production in the rpe of pnpla -cko mice. (a) representative roi of the eyecup from one control and one cko animal isolated at h ( am) and h ( am) post light onset ( am) after immunolabeling for rhodopsin (in green) phalloidin (in yellow) and cre (in red). the column to the right shows magnification of an area. the mean of rhodopsin immunolabel intensity in micrographs (n ≥ rois) from flatmounts (as indicated in x-axis) relative to control at h was determined among three mice per condition and shown in the plot. age of mice was . – . months. (b) ex-vivo β-hb release by the rpe of pnpla -cko eyecups upon ingestion of outer segments (os) in comparison to that of controls. eyecups were isolated at h ( am) and h ( pm) after light onset ( am). statistical significance was calculated using -way anova for the groups (controls and cko mice) with and without treatment (second variance) for each time after light onset (* p= . ; ** p= . ; *** p= . ); ns, not significant. (n = eyecups from control (f/+) mice at . months; n= eyecups from control (f/f cre-) mice at . months; n= eyecups from mice (f/f cre+) at . – . months) (c) the os-mediated increase in β-hb release above basal levels of the cko rpe/choroid explants was calculated from the data in panel (c) and plotted. figure . rpe and retinal functionality in rpe-pnpla -cko mice. (a) histogram showing the amplitude (mean, standard deviation) of the c-wave, fast oscillation (fo), light peak (lp) and off-response (off) measured by dc-erg of -week-old cko (n= , empty histograms) and control mice ((pnpla f/f and pnpla f/+, n= , filled histograms). (b) electroretinograms showing amplitude (y-axis) of scotopic a- and b-wave, and photopic b-wave, as a function of light intensity (x-axis) of and -month-old cko mice (empty circle) and littermate controls (pnpla f/f, filled circles) (n= /genotype). figure . phagocytosis in arpe- cells pretreated with bel. (a) arpe- cells were incubated with bel at the indicated concentrations for . h. then the mixture was removed, washed gently with pbs, and incubated with complete medium for a total of h. cell viability was assessed by crystal violet staining and with three replicates per condition. (b) representative immunoblot of total lysates of cells, which were pretreated with dmso alone, or µm bel/dmso for h prior to pulse-chase of pos, as described in methods. extracts of cells harvested at the indicated times (top of blot) were resolved by sds-page followed by immunoblotting with anti-rhodopsin. migration position of rhodopsin is indicated to the right of the blot. (c) quantification of rhodopsin from total lysates of cells of the pulse-chase experiments as in panel (b). samples from each biological replicate were resolved in duplicate by sds-page from two experiments and single for the third experiment for quantification. intensities of the immunoreactive bands were determined and the percentage of the remaining rhodopsin after - h chase relative to rhodopsin at . h-pulse was plotted. (d) representative immunoblot of total lysates of cells, as in panel b to determine the effects of bel at h and h of chase (as indicated). (e) quantification of rhodopsin from two independent experiments of the pulse-chase experiments as in panel d. samples from each biological replicate were resolved in duplicate by sds-page for quantification. intensities of the immunoreactive bands were determined and the percentage of the remaining rhodopsin after -h chase relative to rhodopsin at . h-pulse was plotted. (f) cells were preincubated with dmso alone, or µm bel/dmso in ringer’s solution at °c for h. then, the mixture was removed, and cells were incubated with ringer’s solution containing mm glucose and pos ( x units/ml) with dmso alone, or µm bel/dmso for the indicated times (x-axis). media were removed to determine the levels of β- hb secretion, which were plotted (y-axis). (n= ) data are presented as means ± s.d. **p< . , ***p< . . figure . knockdown of pnpla in arpe- cells. arpe- cells were transfected with scr (scrambled sirna control) or sirnas targeting pnpla , and mrna levels and protein were tested. (a) rt-qpcr to measure pnpla mrna levels in arpe- cells h post-transfection with scr and six different sirnas (as indicated in the x-axis) was performed and a plot is shown. pnpla mrna levels were normalized to s. all sirna are represented as the percentage of the scrambled sirna control. n = (b) a plot is shown for a time course of pnpla mrna levels following transfection with scr and sipnpla -c. n = (c) rt-qpcr of mock-transfected cells, cells transfected with scr, and sipnpla -c (x-axis) at h after transfection. mrna levels were normalized to the s rna (y-axis). n = (d) total protein was obtained from cells harvested h after transfection and resolved by sds-page followed by western blotting with anti-pnpla and anti-gapdh (loading control). the sirnas used in transfections are indicated at the top, and migration positions for pedf-r and gapdh are to the right of the blot. data are presented as means ± s.d. **p< . , ***p< . ***p< . figure . phagocytosis and fatty acid metabolism in sipnpla cells. arpe- cells were transfected with scr or sirnas targeting pnpla . at h post-transfection, arpe- cells were incubated with pos ( x units/ml) in -well tissue culture plates for pulse-chase experiments. (a) representative immunoblot of total lysates of arpe- cells at . h, h, and . h of pos pulse and at a -h and -h chase period, as indicated at the top of the blot. proteins in cell lysates were subjected to immunoblotting with anti-rhodopsin followed by reprobing with anti- gapdh as the loading control. (b) quantification of rhodopsin from duplicate samples and blots of cell lysates from pulse-chase experiments and time periods (indicated in the x-axis) as from panel. data are presented as means ± s.d. ns, not significant, **p< . . (a). intensities of the immunoreactive bands were determined and the percentage of the remaining rhodopsin after -h and -h chase relative to rhodopsin at . h-pulse was plotted (y-axis). (c-d) levels of secreted free fatty acids (c) and β-hb (d) were measured in culture media of cells transfected with scr or sipnpla following incubation with pos for the indicated periods of times (x-axis). (n = ) data are presented as means ± s.d. * p < . , **p< . . duplex sipnpla c was used to generate the data (see table for sequences of duplexes). supplementary information figure s . proteins in the pos samples were determined and resolved by sds-page in the same gel in two sets: one with µg and another with . µg protein per lane. for each set, one sample was non-reduced and the other was reduced with dtt. after electrophoresis, the gels were cut in half lengthwise. the gel portion with µg of protein was stained with coomassie blue and the other portion with . µg protein was transferred to a nitrocellulose membrane for immunostaining using anti-rhodopsin antibodies (as described in methods). photos of the stained gel and western blot are shown. the proteins of pos isolated from bovine retina had the expected migration pattern for both reduced and non-reduced conditions, and the main bands stained with coomassie blue comigrated with rhodopsin-immunoreactive proteins in western blots of pos proteins. figure s . electron microscopy micrographs. panels a-j show electron microscopy micrographs of rpe structures of -month-old rpe cko prepared as described in the main text and figure . magnification is indicated for each image. the presence of lds was associated with lack (fig. s a) of or the decreased thickness of the basal infoldings, and with granular cytoplasm, abnormal mitochondria (fig. s b), and disorganized localization of organelles (mitochondria and melanosomes) (fig. s a). in some cells, the large lds crowded the cytoplasm and clustered together the mitochondria and melanosomes into the apical region of the cells (figs. s a, s c, s d); however, lds number and expansion within the cells appeared to be random and their expansion could go into any direction (fig. s e). normal apical cytoplasmic processes were lacking; however, degeneration in the outer segment (os) tips of the photoreceptors was visible (figs. s a, s f). additionally, normal phagocytosis of the os was lacking indicating an impaired rpe phagocytosis (figs. s a, s e, s g). there were apparent unhealthy nuclei with pyknotic chromatin and leakage of extranuclear dna (endna), indicating that the beginning of the necrotic process had started (fig. s b). some rpe cells lacked basal infoldings, normally seen at the basal side (fig. s h). occasionally some rpe cells had lighter low-density cytoplasm indicating degeneration of cytoplasmic components in contrast to the denser and fuller cytoplasm in the rpe of the littermate control (fig. s i, s j). figure s . phagocytosis in arpe- cells. arpe- cells were cultured in -well plates for days, and then exposed to pos at x units/ml for up to a . -h pulse followed by a -h chase period as described in methods. (a) representative immunoblots of total cell lysates during pulse-chase (times indicated at the top of the blot) with anti-rhodopsin followed by reprobing with anti- gapdh as the loading control are shown. migration positions of rhodopsin and gapdh are indicated to the right of the blot. duplicate biological replicates were performed. (b) quantification of rhodopsin from duplicate samples per condition from pulse-chase experiments at time periods indicated in the x-axis as from panel (a). intensities of the immunoreactive bands from duplicate samples of cell lysates were determined. the percentage of the remaining rhodopsin after -h chase relative to rhodopsin at . h-pulse was plotted. (c-d) levels of free fatty acids (c) and β-hb (d) measured in culture media of cells incubated with and without pos for the indicated periods of time (x-axis) were plotted and shown. n = data are presented as means ± s.d. * p < . , ***p< . . figure s . phagocytosis in arpe- cells in porous membranes. arpe- cells were treated with x pos/ml. (a) representative immunoblot showing rhodopsin internalization from total cell lysates of arpe- cells following , , and min of pos incubation following plating in -well transwell inserts for weeks. cell extracts were resolved by sds-page followed by immunoblotting with anti-rhodopsin. the blot was stripped and reprobed with anti- gapdh as a loading control. (b) levels of b-hb secreted towards the apical membrane of arpe- cells following pos incubation for , , and min. (n = ) data are presented as means ± s.d. methods: to demonstrate a functional assay to study phagocytosis in arpe- cells we perform the assay with confluent cells attached on porous membranes arpe- cells seeded on porous membranes were incubated for weeks in culturing media. then the media was removed and replaced with ringer’s solution alone or ringer’s solution containing x pos/ml and mm glucose for the indicated time points. rhodopsin was detected by western blotting. rhodopsin levels in the lysates of cells incubated with pos were detected in as little as min and up to . h following pos incubation, while rhodopsin was undetectable in cells without pos (fig. s a). β-hb levels released into the media of the apical chamber of transwells following pos incubation increased four-fold and three-fold after h and . h, respectively, while released β-hb levels from cells incubated with ringer’s solution alone did not increase (fig. s b). figure s : arpe- cells were transfected with siscramble sirna control or sirnas targeting pnpla (sipnpla a). rt-qpcr to measure pnpla mrna levels in arpe- cells at (a) h post-transfection and (b) . h post transfection equivalent to pulse ( . h) and chase ( h) was performed with sirna duplexes (as indicated in the x-axis). treatment of cells in panel b was as for pulse-chase (see diagram in fig s ). pnpla mrna levels were normalized to s. n = biological replicates, each data point corresponds to the average of triplicate pcr reactions. the rt-pcr was repeated twice per biological replicate. values that fell out of the standard curve were not included in the plot. the data shows that sipnpla duplex silenced pnpla in arpe- at h post-transfection and that silencing was maintained throughout a . h and pulse-chase of h. floxed allele cre-recombined allele bp bp mw cko ctr cko ctr figure . b. a. c. d. e. co ntr ol cko mouse # c re p os iti ve c el ls (% ) control cko cre/phalloidin co ntr ol ck o . . . . p np la / h p r t figure . bi ld a. b. bi ld bi bi co nt ro l ck o figure . a. ho ur s ( a m ) ck o co nt ro l rhodopsin/cre/phalloidin ho ur s ( a m ) ck o co nt ro l hours ( am) hours ( pm) β -h yd ro xy bu ty ra te (n m ol ) control control +os cko cko+os ****** ** ** h ( am) h ( pm) time after light onset ∆ β -h b (n m ol ) control cko * * b. c. co ntr ol ck o co ntr ol ck o mouse r ho do ps in (r el at iv e to c on tro l h) h h scotopic a-wave scotopic b-wave photopic b-wave - - - - - - - - - - - - control cko - - - - - - - - - - - - m on th m on th light intensity log (cd/s.m ) a m pl itu de (µ v) c-wave fo lp off . . . . . a m pl itu de (µ v) figure . a. b. figure . a. b. c. d. . . . . bel (µm) c el l v ia bi lit y (a bs n m ) . . . . . . time (h) β -h b (n m ol es ) ** *** ** ** *** *** bel (µm) -rhodopsin bel (µm) e. -rhodopsin bel (µm) . . . . (h) f. . . . . . . (h) bel (µm) r ho do ps in r em ai ni ng (% re la tiv e to . h ) h ✱✱✱ ✱✱ . . time (h) r ho do ps in r em ai ni ng (% re la tiv e to . h ) bel (µm) ns **** *** figure . b. a. c. d. . . . . . time (h) p n p la / s scr sipnpla *** *** *** no ne sc r pn pl a . . . . sirna pn pl a / s n.s. *** scr a b c d e f sirna p n p la / s (% ) *** *** *** *** *** ** none scr c d e sirna -gapdh -pedf-r figure . a. c. d. scr sipnpla -gapdh -rhodopsin . . . . (h) b. . . . . . . . time (h) β -h yd ro xy bu ty ra te (n m ol ) scr sipnpla * * ** . . . . . . . time (h) fr ee fa tt y ac id s (p m ol ) scr sipnpla * time (h) r ho do ps in r em ai ni ng (% re la tiv e to . h ) scr sipnpla ✱✱ ✱✱✱ degradation of photoreceptor outer segments by the retinal pigment epithelium requires pigment epithelium-derived factor receptor (pedf-r) jeanee bullock, federica polato, mones abu-asab, alexandra bernardo-colón, ivan rebustini, elma aflaki, martin-paul agbaga, s. patricia becerra supplementary figures pos (µg) . . dtt - + - + ~ ~ ~ ~ ~ ~ ~ ~ ~ mw x - coomassie blue ab-rhodopsin proteins in the pos samples were determined and resolved by sds-page in the same gel in two sets: one with µg and another with . µg protein per lane. for each set, one sample was non-reduced and the other was reduced with dtt. after electrophoresis, the gels were cut in half lengthwise. the gel portion with µg of protein was stained with coomassie blue and the other portion with . µg protein was transferred to a nitrocellulose membrane for immunostaining using anti-rhodopsin antibodies (as described in methods). photos of the stained gel and western blot are shown. the proteins of pos isolated from bovine retina had the expected migration pattern for both reduced and non-reduced conditions, and the main bands stained with coomassie blue comigrated with rhodopsin-immunoreactive proteins in western blots of pos proteins. figure s . sds-page and western blot of bovine pos a. b. c. d. e. f. g. h. i. j. figure s . tem of rpe in rpe-pnpla -cko mice the presence of lds was associated with lack (fig. s a) of or the decreased thickness of the basal infoldings, and with granular cytoplasm, abnormal mitochondria (fig. s b), and disorganized localization of organelles (mitochondria and melanosomes) (fig. s a). in some cells, the large lds crowded the cytoplasm and clustered together the mitochondria and melanosomes into the apical region of the cells (figs. s a, s c, s d); however, lds number and expansion within the cells appeared to be random and their expansion could go into any direction (fig. s e). normal apical cytoplasmic processes were lacking; however, degeneration in the outer segment (os) tips of the photoreceptors was visible (figs. s a, s f); . additionally, normal phagocytosis of the os was lacking indicating an impaired rpe phagocytosis (figs. s a, s e, s g). there were apparent unhealthy nuclei with pyknotic chromatin and leakage of extranuclear dna (endna), indicating that the beginning of the necrotic process had started (fig. s b). some rpe cells lacked basal infoldings, normally seen at the basal side (fig. s h). occasionally some rpe cells had lighter low-density cytoplasm indicating degeneration of cytoplasmic components in contrast to the denser and fuller cytoplasm in the rpe of the littermate control (fig. s i, s j). figure s . a. c. d. . time (h) rh od op si n re m ai ni ng (% re la tiv e to . h ) . . time (h) fr ee fa tt y ac id s (p m ol ) - pos + pos *** *** * . . time (h) β- hy dr ox yb ut yr at e (n m ol ) - pos + pos *** *** *** b. . . (h) -gapdh -rhodopsin - days . h h . h h h plate cells + pos/ml remove pos add complete media media → ffa, β-hb cells → wb pulse chase figure s . phagocytosis in arpe- cells. arpe- cells were cultured in -well plates for days, and then exposed to pos at x units/ml for up to a . -h pulse followed by an upto -h chase period as described in methods. (a) representative immunoblots of total cell lysates during pulse-chase (times indicated at the top of the blot) with anti-rhodopsin followed by reprobing with anti-gapdh as the loading control are shown. migration positions of rhodopsin and gapdh are indicated to the right of the blot. duplicate biological replicates were performed. (b) quantification of rhodopsin from duplicate samples per condition from pulse-chase experiments at time periods indicated in the x-axis as from panel (a). intensities of the immunoreactive bands from duplicate samples of cell lysates were determined. the percentage of the remaining rhodopsin after -h chase relative to rhodopsin at . h-pulse was plotted. (c-d) levels of free fatty acids (c) and -hb (d) measured in culture media of cells incubated with and without pos for the indicated periods of time (x-axis) were plotted and shown. n = data are presented as means ± s.d. * p < . , ***p< . . min - pos +pos -gapdh -rhodopsin a. cells on porous membranes b. cells on porous membranes time (min) s ec re te d β- h b (n m ol es ) - pos + pos min - pos +pos -rhodopsin -gapdh c. cells on plastic figure s . phagocytosis in arpe- cells in porous membranes. arpe- cells were treated with x pos/ml. (a) representative immunoblot showing rhodopsin internalization from total cell lysates of arpe- cells following , , and min of pos incubation following plating in -well transwell inserts for weeks. cell extracts were resolved by sds-page followed by immunoblotting with anti-rhodopsin. the blot was stripped and reprobed with anti-gapdh as a loading control. (b) levels of b-hb secreted towards the apical membrane of arpe- cells following pos incubation for , , and min. data are presented as means ± s.d. arpe- cells plated on porous membranes engulf bovine outer segments to demonstrate a functional assay to study phagocytosis in arpe- cells we perform the assay with confluent cells attached on porous membranes methods: arpe- cells seeded on porous membranes were incubated for weeks in culturing media. then the media was replaced with ringer’s solution alone or ringer’s solution containing x pos/ml and mm glucose for the indicated time points. rhodopsin was detected by western blotting. rhodopsin levels in the lysates of cells incubated with pos were detected in as little as min and up to . h following pos incubation, while rhodopsin was undetectable in cells without pos (fig. s a). b-hb levels released into the media of the apical chamber of transwells following pos incubation increased four-fold and three-fold after and min, respectively, while released b-hb levels from cells incubated with ringer’s solution alone did not increase (fig. s b). figure s . siscramble sirna a . . . . . . p np la / s **** siscramble sirna a . . . . . p np la / s **** arpe- cells were transfected with siscramble sirna control or sirnas targeting pnpla (sipnpla a). rt-qpcr to measure pnpla mrna levels in arpe- cells at (a) h post-transfection and (b) . h post transfection equivalent to pulse ( . h) and chase ( h) was performed with sirna duplexes (as indicated in the x-axis). treatment of cells in panel b was as for pulse-chase (see diagram in fig s ). pnpla mrna levels were normalized to s. n = biological replicates, each data point corresponds to the average of triplicate pcr reactions. the rt-pcr was repeated twice per biological replicate. values that fell out of the standard curve were not included in the plot. the data shows that sipnpla duplex silenced pnpla in arpe- at h post-transfection and that silencing was maintained throughout a . h and pulse-chase of h. figure s . a. h post transfection b. . h post transfection, parallel to pulse-chase acorresponding author: s. patricia becerra nih-nei-lrcmb section of protein structure and function bg. , rm. center drive msc bethesda, md - becerrap@nei.nih.gov pedf-r in phagocytosis - - ms revised - - .pdf acorresponding author: s. patricia becerra nih-nei-lrcmb section of protein structure and function bg. , rm. center drive msc bethesda, md - becerrap@nei.nih.gov phagocytosis and pedf-r figures - - revised - - .pdf slide number slide number slide number slide number slide number slide number slide number supplemmentary information - - revised - - .pdf degradation of photoreceptor outer segments by the retinal pigment epithelium requires pigment epithelium-derived factor receptor (pedf-r)�jeanee bullock, federica polato, mones abu-asab, alexandra bernardo-colón, ivan rebustini, elma aflaki, martin-paul agbaga, s. patricia becerra slide number slide number slide number slide number slide number rhodopsin in rpe of cko and control mice slide number slide number slide number slide number lithium ions display weak interaction with amyloid-beta (aβ) peptides and have minor effects on their aggregation lithium ions display weak interaction with amyloid-beta (aβ) peptides and have minor effects on their aggregation elina berntsson , , suman paul , faraz vosough , sabrina b. sholts , jüri jarvet , , per m. roos , , andreas barth , astrid gräslund , sebastian k. t. s. wärmländer ,* department of biochemistry and biophysics, stockholm university, sweden. department of chemistry and biotechnology, tallinn university of technology, estonia; department of anthropology, national museum of natural history, smithsonian institution, washington, dc, usa. the national institute of chemical physics and biophysics, tallinn, estonia. institute of environmental medicine, karolinska institutet, stockholm, sweden. department of clinical physiology, capio st. göran hospital, stockholm, sweden. * correspondence: seb@dbb.su.se; tel.: + abstract: alzheimer’s disease (ad) is an incurable disease and the main cause of age- related dementia worldwide, despite decades of research. treatment of ad with lithium (li) has showed promising results, but the underlying mechanism is unclear. the pathological hallmark of ad brains is deposition of amyloid plaques, consisting mainly of amyloid-β (aβ) peptides aggregated into amyloid fibrils. the plaques contain also metal ions of e.g. cu, fe, and zn, and such ions are known to interact with aβ peptides and modulate their aggregation and toxicity. the interactions between aβ peptides and li+ ions have however not been well investigated. here, we use a range of biophysical techniques to characterize in vitro interactions between aβ peptides and li+ ions. we show that li+ ions display weak and non- specific interactions with aβ peptides, and have minor effects on aβ aggregation. these results indicate that possible beneficial effects of li on ad pathology are not likely caused by direct interactions between aβ peptides and li+ ions. key words: alzheimer’s disease; protein aggregation; metal-protein binding; neurodegeneration; pharmaceutics running title: li+ ions have minor effects on aβ aggregation .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction alzheimer’s disease (ad) is still an incurable disease and the main cause of age-related dementia worldwide (querfurth & laferla, ; prince et al., ; frozza et al., ), despite decades of research on putative drugs (luo et al., ; wärmländer et al., ; decker & munoz-torrero, ; kisby et al., ). in addition to signs of neuroinflammation and oxidative stress (agostinho et al., ; al-hilaly et al., ; wang et al., ; heppner et al., ; regen et al., ), ad brains display characteristic lesions in the form of intracellular neurofibrillary tangles, consisting of aggregated hyperphosphorylated tau proteins (goedert, ; gibbons et al., ), and extracellular amyloid plaques, consisting mainly of insoluble fibrillar aggregates of amyloid-β (aβ) peptides (glenner & wong, ; querfurth & laferla, ). these aβ fibrils and plaques are the end-product of an aggregation process (querfurth & laferla, ; luo et al., ; selkoe & hardy, ) that involves extra- and/or intracellular formation of intermediate, soluble, and likely neurotoxic aβ oligomers (luo et al., ; selkoe & hardy, ; sengupta et al., ; lee et al., ) that can spread from neuron to neuron via exosomes (nath et al., ; sardar sinha et al., ). the aβ peptides comprise - residues and are intrinsically disordered in aqueous solution. they have limited solubility in water due to the hydrophobicity of the central and c- terminal aβ segments, which may fold into a hairpin conformation upon aggregation (abelein et al., ; baronio et al., ). the charged n-terminal segment is hydrophilic and readily interacts with cationic molecules and metal ions (luo et al., ; luo et al., ; tiiman et al., ; wallin et al., ; wallin et al., ; owen et al., ; wallin et al., ), while the hydrophobic c-terminal segment can interact with membranes where aβ may exert its toxicity (Österlund et al., ; wärmländer et al., ). the interactions between aβ and metal ions are of particular interest (duce et al., ; wärmländer et al., .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ; mital et al., ; wärmländer et al., ; wallin et al., ), as altered metal concentrations indicative of metal dyshomeostasis are a prominent feature in the brains and fluids of ad patients (wang et al., ; szabo et al., ), and because ad plaques contain elevated amounts of metal ions of e.g. cu, fe, and zn (beauchemin & kisilevsky, ; lovell et al., ; miller et al., ). interestingly, although the role of metal ions in ad pathogenesis remains debated (duce et al., ; modgil et al., ; chin-chan et al., ; mital et al., ; adlard & bush, ; huat et al., ; wärmländer et al., ), monovalent ions of the alkali metal lithium [i.e., li+ ions] may provide beneficial effects to patients with neurodegenerative disorders such as amyotrophic lateral sclerosis (als) (fornai et al., ; morrison et al., ) or ad (engel et al., ; mauer et al., ; sutherland & duthie, ; decker & munoz- torrero, ; donix & bauer, ; morris & berk, ; kerr et al., ; hampel et al., ; kisby et al., ; priebe & kanzawa, ). lithium salts are commonly used in psychiatric medication, even though it is not understood how the li+ ions affect the molecular mechanisms underlying the psychiatric disorders (dell'osso et al., ). unlike other pharmaceuticals, li+ is widely non-selective in its biochemical effects, possibly due to its general propensity to inhibit the many enzymes that have magnesium as a cofactor (ge & jakobsson, ). cell and animal studies have provided clues regarding how li+ ions may affect the ad disease pathology (nery et al., ; sofola-adesakin et al., ; zhao et al., ; budni et al., ; habib et al., ; cardillo et al., ; kerr et al., ; habib et al., ; rocha et al., ; wilson et al., ). due to its ability to down-regulate translation, li+ caused a reduction in protein synthesis and thus aβ levels in an adult-onset drosophila model of ad (sofola-adesakin et al., ). li+ reduces aβ production by affecting the processing/cleavage of the amyloid-β precursor protein (aβpp) in cells and mice, presumably .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / by down-regulating the levels of phosphorylated app. a main target of li+ is the glycogen synthase kinase -beta (gsk- β) (ryves & harwood, ) which is implicated in ad pathogenesis (caccamo et al., ; forlenza et al., ). in aβpp-transgenic mice, reduced activation of the gsk- β enzyme was associated with decreased levels of app phosphorylation that resulted in decreased aβ production (rockenstein et al., ). one study on mice with traumatic brain injury reported that li+-treatment improved spatial learning and reduced aβ production, possibly by reducing the levels of both aβpp and the aβpp-cleaving enzyme bace (yu et al., ). more recent mice studies have reported that treatment with li+ ions improved aβ clearance from the brain (pan et al., ), reduced oxidative stress levels (xiang et al., ), improved spatial memory (habib et al., ), and reduced the amounts of aβ plaques and phosphorylated tau while also improving spatial memory (liu et al., ). only a few studies have however investigated how li+ ions could affect the molecular events that appear to underlie ad pathology, such as aβ aggregation. one study showed that increased ionic strength, i.e. mm of naf, nacl, or licl, significantly accelerated the kinetics of aβ amyloid formation, by promoting surface-catalyzed secondary nucleation reactions (abelein et al., ). another recent study used molecular dynamics simulations to find small but distinct differences in how the three monovalent li+, n+, and k+ ions interact with aβ oligomers (huraskin & horn, ). the therapeutic effect of li+ on aβ plaque quality and toxicity has been reported in mice, where li+ treatment before pathology onset induced smaller plaques with higher aβ compaction, reduced oligomeric-positive halo, and attenuated capacity to induce neuronal damage (trujillo-estrada et al., ). one hypothesis is that these neuroprotective effects of li+ could be mediated by modifications of the plaque toxicity through the astrocytic release of heat shock proteins (trujillo-estrada et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / here, we use a range of biophysical techniques to characterize the in vitro interactions between li+ ions and aβ peptides, and how such interactions affect the aβ amyloid aggregation processes and fibril formation. materials and methods sample preparation recombinant aβ peptides were purchased from alexotech ab (umeå, sweden) in either unlabeled or uniformly n-labeled form. the lyophilized peptides were stored at - °c. samples were dissolved to monomeric form immediately before each measurement. the peptides were first dissolved in mm naoh, and then sonicated in an ice-bath to avoid having pre-formed aggregates in the sample solutions. next, the samples were diluted in mm buffer of either sodium phosphate or mes ( -[n-morpholino]ethanesulfonic acid). all preparation steps were performed on a bed of ice, and the peptide concentration was determined by weight. licl salt was purchased from merck & co. inc. (usa), and mes hydrate was purchased from sigma-aldrich (usa). synthetic aβ peptides were purchased from jpt peptide technologies (germany) and used to prepare monomeric solutions via size exclusion chromatography. mg of lyophilized aβ powder was dissolved in ml dmso. a sephadex g- hitrap desalting column (ge healthcare, uppsala) was equilibrated with mm naoh solution (ph= . ), and washed with - ml of mm naod, pd= . (glasoe & long, ) solution. the peptide solution in dmso was applied to the column, followed by injection of . ml of mm naod. collection of peptide fractions in mm naod on ice was started at a mg/ml flow rate. ten fractions of ml volumes were collected in . ml eppendorf tubes. the absorbance for each fraction at nm was measured with a nanodrop instrument (eppendorf, germany), and peptide concentrations were determined using a molar extinction .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / coefficient of m- cm- for the single tyr in aβ (edelhoch, ). the peptide fractions were flash-frozen in liquid nitrogen, covered with argon gas on top in . ml eppendorf tubes, and stored at - °c until used. sodium dodecyl sulfate (sds)-stabilized aβ oligomers of two well-defined sizes (approximately tetramers and dodecamers) were prepared according to a previously published protocol (barghorn et al., ), but in d o, at -fold lower peptide concentration and without the original dilution step (vosough & barth, manuscript). the reaction mixtures ( µm aβ in pbs and containing . % or . % sds) were incubated together with - mm licl at °c for hours, and then flash- frozen in liquid nitrogen and stored at - °c for later analysis. thioflavin t kinetics a fluostar omega microplate reader (bmg labtech, germany) was used to monitor the effect of li+ ions on aβ aggregation kinetics, µm monomeric aβ peptides were incubated in mm mes buffer, ph . , together with different concentrations of licl ( , μm, μm, μm) and μm thioflavin t (tht). tht is a fluorescent benzothiazole dye, and its fluorescence intensity increases when bound to amyloid aggregates (gade malmos et al., ). samples were placed in a -well plate where the sample volume in each well was µl, four replicates per li+ concentration were measured, the temperature was + °c, excitation of the tht dye was at nm, the tht fluorescence emission at nm was measured every five minutes, each five-minute cycle involved seconds of shaking at rpm, the samples were incubated for a total of hours, and the assay was repeated three times. to derive parameters for the aggregation kinetics, the tht fluorescence curves were fitted to the sigmoidal equation : (eq. ) .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / where f and f∞ are the intercepts of the initial and final fluorescence intensity baselines, m and m∞ are the slopes of the initial and final baselines, t½ is the time needed to reach halfway through the elongation phase (i.e., aggregation half-time), and τ is the elongation time constant (gade malmos et al., ). the apparent maximum rate constant, rmax, for the growth of fibrils is given by /τ. tyrosine fluorescence quenching the binding affinity between aβ peptides and li + ions was evaluated from cu +/ li+ binding competition experiments (wallin et al., ). the affinity of the cu +·aβ complex was measured via the quenching effect of cu + ions on the intrinsic fluorescence of y , which is the only fluorophore in native aβ peptides. the fluorescence emission intensity at nm (excitation wavelength nm) was recorded at °c using a jobin yvon horiba fluorolog fluorescence spectrophotometer (longjumeau, france). the titrations were carried out by consecutive additions of . – . µl aliquots of either , , or mm stock solutions of cucl to µl of µm aβ in mm mes buffer, ph . , in a quartz cuvette with mm path length. after each addition of cucl the solution was stirred for seconds before recording fluorescence emission spectra. copper titrations were conducted for aβ samples both in the absence and the presence of mm licl. the dissociation constant of the cu +·aβ complex was determined by fitting the cu + titration data to equation : (eq. ) .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / where i is the initial fluorescence intensity without cu + ions, i∞ is the steady-state (saturated) intensity at the end of the titration series, [aβ] is the peptide concentration, [cu] is the concentration of added cu + ions, kd is the dissociation constant of the cu +·aβ complex, and k is a constant accounting for the concentration-dependent quenching effect induced by free (non-bound) cu + ions that may collide with the y residue (lindgren et al., ). this model assumes a single binding site. as no corrections for buffer conditions are made, i.e. in terms of possible interactions between the metal ions and the buffer, the calculated dissociation constant should be considered to be apparent. atomic force microscopy imaging samples of µm aβ in mm mes buffer (total volume µl, ph . ) with either , µm, µm or mm licl were put in small eppendorf tubes and incubated for hours at oc under continuous shaking at rpm. a droplet ( µl) of incubated solution was then placed on a fresh silicon wafer (siegert wafer gmbh, germany) and left to dry for minutes. next, ul of milli-q h o was carefully added to the semi-dried sample droplet and soaked immediately with a lint-free wipe, to remove excess salts in a mild manner. the wafer was left to dry in a covered container to protect it from dust, and atomic force microscopy (afm) images were recorded on the same day. a neasnom scattering- type near-field optical instrument (neaspec gmbh, germany) was used to collect the afm images under tapping mode (Ω: khz, tapping amplitude - nm) using pt/ir-coated monolithic arrow-ncpt si tips (nanoandmore gmbh, germany) with tip radius < nm. images were acquired on . x . µm scan-areas ( x -pixel size) under optimal scan- speed (i.e., . ms/pixel). the recorded images were minimally processed using the gwyddion software where a basic plane leveling was performed (nečas & klapetek, ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / nuclear magnetic resonance spectroscopy an avance mhz nuclear magnetic resonance (nmr) spectrometer (bruker inc., usa) equipped with a cryoprobe was used to investigate possible interactions between li+ ions and monomeric aβ peptides at the atomic level. d h- n-hsqc spectra of . μm monomeric n-labeled aβ peptides were recorded at °c with / h o/d o, either in mm mes buffer at ph . or in x pbs buffer ( mm nacl, . mm kcl, and mm phosphate ph . ), before and after additions with licl. diffusion measurements were performed on a sample of μm unlabeled monomeric aβ peptide in mm sodium phosphate buffer, % d o, pd . , at °c, before and after additions with licl dissolved in d o. the diffusion experiments employed pulsed field gradients (pfg:s) according to previously described methods (danielsson et al., ), and methyl group signals between . - . ppm were integrated, evaluated, and corrected for the viscosity of d o at °c (cho et al., ). all nmr data was processed with the topspin version . . software, and the hsqc crosspeak assignment for aβ in buffer is known from previous studies (danielsson et al., ). circular dichroism spectroscopy circular dichroism (cd) spectra of μm aβ peptides in mm sodium phosphate buffer, ph . , were recorded at °c using a chirascan cd spectrometer (applied photophysics, uk) and a quartz cuvette with an optical path length of mm. measurements were done between − nm, with a step size of nm and a sampling time of s per data point. first, a spectrum was recorded for aβ alone. next, micelles of mm sds were added to create a membrane-mimicking environment. finally, licl was titrated to the sample in steps up to a concentration of µm. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / blue native polyacrylamide gel electrophoresis homogeneous solutions of µm aβ oligomers prepared in presence and absence of – mm li+ ions were analyzed with blue native polyacrylamide gel electrophoresis (bn- page) using the invitrogen system. - % bis-tris novex gels (thermofisher scientific, usa) were loaded with µl of aβ oligomer samples alongside the amersham high molecular weight calibration kit for native electrophoresis (ge healthcare, usa). the gels were run at °c using the electrophoresis system according to the invitrogen instructions (thermofisher scientific, usa), and then stained using the pierce silver staining kit according to the instructions (thermofisher scientific, usa). infrared spectroscopy fourier-transformed infrared (ftir) spectra of aβ oligomers were recorded in transmission mode on a tensor ftir spectrometer (bruker optics, germany) equipped with a sample shutter and a liquid nitrogen-cooled mct detector. the unit was continuously purged with dry air during the measurements. - µl of the µm aβ oligomer samples, containing – mm licl, were put between two flat caf discs separated by a µm plastic spacer covered with vacuum grease at the periphery. the assembled discs were mounted in a holder inside the instrument’s sample chamber. the samples were allowed to sit for at least minutes after closing the chamber lid, to avoid interference from co and h o vapor. ftir spectra were recorded at room temperature in the - cm- range, with scans for both background and sample spectra, using a mm aperture and a resolution of cm- . the light intensities above cm- and below cm- were blocked with respectively a germanium filter and a cellulose membrane (baldassarre & barth, ). the spectra were analyzed and plotted with the opus . software, and second derivatives were calculated with a cm- smoothing range. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / results tht fluorescence: influence of li+ ions on aβ aggregation the fluorescence intensity of the amyloid-marker molecule tht was measured when µm aβ samples were incubated for hours together with different concentrations of licl (fig. ). fitting eq. to the tht fluorescence curves yielded the kinetic parameters t / (aggregation half-time) and rmax (maximum aggregation rate) (fig. ; table ). for µm aβ alone, the aggregation half-time is approximately . hours under the experimental conditions used, and the maximum aggregation rate is . hours- (table ). these kinetic parameters are not much affected by addition of licl in : or : li+:aβ ratios. at the li+:aβ ratio of : , the rmax value remains largely unaffected while the aggregation half- time is increased to almost hours (fig ; table ). the observation that a li+:aβ ratio of : is required to shift the tht curve clearly shows that li+ ions do not have a strong effect on the aβ aggregation kinetics. afm imaging: effects of li+ ions on the morphology of aβ aggregates afm images (fig. ) were recorded for the aggregation products of µm aβ peptide, incubated for three days without or with licl. the control sample without li+ displays long (> µm) amyloid fibrils that are around nm thick, together with small (< nm) aggregate particles that may be protofibrils (fig. a). the distribution and sizes of these aggregates are rather typical for aβ aggregates formed in vitro (luo et al., ). the aβ samples incubated in the presence of different concentrations of li+ ions display amyloid fibrils of similar size and shape, although these fibrils are more densely packed and they appear to be more numerous (figs. b, c, d). compared to the control sample, there are fewer small (< nm) aggregate particles in the samples incubated together with li+ ions in : and : .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / li+:aβ ratios. this suggests that li+ ions may induce some differences in the aβ aggregation process. nmr spectroscopy: interactions between li+ ions and aβ monomers high-resolution liquid phase nmr experiments were conducted to investigate if residue- specific molecular interactions could be observed between li+ ions and monomeric aβ peptides. d h- n-hsqc spectra showing the amide crosspeak region for . μm monomeric n-labeled aβ peptides are presented in fig. a, before and after addition of licl in : , : , and : aβ:li+ ratios in mm mes buffer, . . addition of li+ ions induces loss of signal intensity mainly for amide crosspeaks corresponding to residues in the n-terminal half of the peptide, indicating selective li+ interactions in this region (fig. b). the effects are clearly concentration-dependent. because li+ ions are not paramagnetic, this loss of signal intensity is arguably caused by chemical exchange related to structural rearrangements induced by the li+ ions. as no chemical shift changes are observed for the crosspeak position (fig. a), these li+-induced secondary structures appear to be short-lived. figs. c and d show the results of similar experiments carried out in x pbs buffer, i.e. mm nacl, . mm kcl, and mm phosphate ph . . here, the li+ ions induce virtually no changes in the crosspeak intensities, showing that the weak li+/aβ interactions observed in pure mes buffer (figs. a,b) disappear when the buffer and ionic strength correspond to physiological conditions. diffusion measurements were carried out for μm aβ peptides in d o, before and after addition of licl in : , : , and : li+:aβ ratios. addition of : li+ produces an increase in the aβ diffusion rate by around %, i.e. from . · - m /s to . · - m /s (figs. a and b). this somewhat faster diffusion is likely caused by the aβ peptide adopting a slightly more compact structure in the presence of li+ ions, an effect similar to .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / that previously reported for zinc ions (abelein et al., ). addition of even higher li+ concentrations – and times the aβ concentration – produces diffusion rates that are similar but a little bit lower than the diffusion rate measured for : li+:aβ ratio, i.e. respectively . · - m /s and . · - m /s (figs. c and d), indicating that the effect of li+ on the aβ secondary structure and diffusion has been saturated. fluorescence spectroscopy: li+ binding affinity to the aβ monomer binding affinities for metal ions to aβ peptides can often be measured via the quenching effect on the intrinsic fluorescence of y , the only fluorophore in native aβ peptides. however, not all metal ions interfere with tyrosine fluorescence, and initial experiments showed that addition of li+ ions does not affect the aβ fluorescence. the binding affinity of li+ ions to aβ was therefore evaluated from binding competition experiments with cu + ions (danielsson et al., ; wallin et al., ), which induce much stronger tyrosine fluorescence quenching when bound to the peptide than when free in the solution (lindgren et al., ). fig. shows the results of titrating cucl to aβ , both in the absence (red circles) and in the presence (blue triangles) of mm licl. three titrations were carried out for each condition, producing apparent kd values for the cu +·aβ complex of respectively . µm, . µm, and . µm without licl, i.e. on average . ± . µm, and . µm, . µm, and . µm with licl present, i.e. on average . ± . µm. the obtained values are in line with earlier fluorescence measurements of the cu + binding affinity to the aβ peptide, although this affinity is known to vary with the ph, the buffer, and other experimental conditions (ghalebani et al., ; alies et al., ). the difference between the average measured kd values is not significant at the % level with a two-tailed t-test, which shows that li+ ions are not able to compete with cu + for binding to aβ. thus, the li+ binding affinity for aβ is likely in the millimolar range, or weaker. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / cd spectroscopy: effects of li+ ions on aβ structure in sds although aβ peptides are generally disordered in aqueous solutions, they adopt an α- helical secondary structure in membranes and membrane-mimicking environments such as sds micelles (tiiman et al., ; Österlund et al., ). thus, the cd spectrum for aβ in sodium phosphate buffer displays the characteristic minimum for random coil structure at nm (fig. ). addition of mm sds, which is well above the critical concentration for micelle formation (Österlund et al., ), induces an alpha-helical structure with characteristic minima around and nm. titrating licl in concentrations up to µm to the aβ sample slightly increases the general cd intensity, but does not change the overall spectral shape – the minima remain at their respective positions. the intensity changes are not caused by dilution of the sample during the titration, as the added volumes are very small, and as dilution would not increase but rather decrease the cd intensity. the observed changes in cd intensity therefore suggest a small but distinct binding effect of licl ions. this binding effect appears to be much weaker than the structural rearrangements and aβ coil-coil-interactions previously reported to be induced by cu + ions (tiiman et al., ). bn-page: effects of li+ ions on aβ oligomer formation and stability well-defined and sds-stabilized aβ oligomers were prepared in the presence of different amounts of licl. sds treatment of aβ peptides at low concentrations (≤ mm) leads to formation of stable and homogeneous aβ oligomers of certain sizes and conformations (barghorn et al., ; rangachari et al., ). as shown in fig. , two sizes of aβ oligomers are formed in presence of the two sds concentrations. in . % ( . mm) sds, small oligomers with a molecular weight (mw) around - kda are formed. these oligomers appear to contain a large fraction of tetramers (vosough & barth, manuscript). in .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . % ( . mm) sds, larger oligomers with mws around - kd are formed (barghorn et al., ). these larger oligomers, which most likely contain twelve aβ monomers, display a globular morphology and are therefore sometimes called globulomers (barghorn et al., ). all oligomers were analyzed by bn-page instead of by sds-page to avoid disruption of the non-cross linked aβ oligomers by the high (> %) sds concentrations used in sds- page (bitan et al., ). as shown in lanes - and - of fig. , increasing licl concentrations have weak or no effects on the size or homogeneity of the formed aβ oligomers, as the bands retain their shape and intensity. only for the globulomers subjected to the highest licl concentration ( mm) is the intensity of the bn-page band slightly reduced (lane , fig ). ftir spectroscopy: effects of li+ ions on aβ oligomer structure the secondary structures of aβ oligomers formed with different li + concentrations were studied with ftir spectroscopy, where the amide i region ( - cm- ) is very sensitive to changes in the protein backbone conformation. the technique is useful also in amyloid research, given its capacity to characterize β-sheets (barth, ; sarroukh et al., ). fig. shows second derivative ir spectra for the amide i region of aβ globulomers (fig. a) and smaller oligomers (fig. b), prepared with different concentrations of li+ ions. monomeric aβ displayed a relatively broad band at - cm - , which is in agreement with the position of the band for disordered (random coil) polypeptides measured in d o (barth, ). for both types of aβ oligomers, this main band is much narrower and downshifted by about cm- , while a second smaller band appears around cm- . this split band pattern is indicative of an anti-parallel β-sheet conformation (cerf et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / earlier studies in our laboratory have shown a relationship between the position and width of this main band, and the size and homogeneity of the aβ oligomers (vosough & barth, manuscript). the lower band position of the larger oligomers is in line with this relationship and our previous results, and confirms the different sizes of the oligomers produced at the two sds concentrations. we have recently observed that a number of transition metal ions induce significant effects on the main band position for aβ oligomers (manuscript in preparation). because the spectra for aβ oligomers formed with different amounts of licl generally superimpose on the ir spectra for the li+-free oligomers, with no shifts observed for the main band, it appears that li+ ions have no significant effect on the oligomers’ size or secondary structure. discussion lithium as a therapeutic agent lithium has no known biological functions in the human body. li+ ions readily pass biological membranes, and are evenly distributed in tissues and easily eliminated via the kidneys (nordberg et al., ). li+ ions are however far from inert, and several well-defined medical conditions related to abnormal li+ concentrations exist. in low blood concentrations, li+ is used as a medication for bipolar and schizoaffective disorders (machado-vieira et al., ), but at higher concentrations li+ ions are neurotoxic (sellers et al., ; emilien & maloteaux, ; nordberg et al., ; wen et al., ). this leaves a narrow therapeutic window of . - . mm that has to be closely monitored in order to prevent li+ intoxication, which is easily recognized by eeg (mignarri et al., ) and treatable by reducing the therapeutic dose. li+ intoxication (> . mm) presents as apathy, vertigo, tremor and gastrointestinal symptoms, in more severe cases confusion, psychosis, myoclonus and cardiac .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / arrhythmias (nordberg et al., ). li+ intoxication affects also the kidneys with polyuria and elevated u-albumin although overt renal failure is rare (nordberg et al., ). treatment of bipolar and schizoaffective disorders with li+ has generated some knowledge about li+ metabolism in the human body (wen et al., ; medic et al., ). li+ accumulates to some extent in bone (birch, ), and chronic li+ effects are implicated in osteomalacia and severe osteoporosis (roos, ). patients treated with li+ also show an increased frequency of hypothyroidism and goitre, and widespread effects on several facets of the endocrine system have been noted (salata & klein, ). the negative effects of li+ on thyroid function have been clearly demonstrated in a study on populations in the andean mountains, where natural exposure to li+ is high, and where urinary li+ was found to correlate negatively with free thyroxine (t ) but correlate positively with the pituitary gland hormone thyrotropin (broberg et al., ). the toxicity of li+ is further emphasized by studies from regions with naturally elevated concentrations of li+ in potable water, where reduced fetal size has been noted to correlate linearly with increases in blood li+ (harari et al., ). to what extent li+ treatment reduces the development of ad symptoms is unclear (engel et al., ; mauer et al., ; nordberg et al., ; sutherland & duthie, ). bipolar disorder increases the risk of ad when compared to the general population, and li+ treatment seems to reduce this risk (velosa et al., ), but the mechanisms mediating this effect are far from elucidated (kerr et al., ). in rare cases even regular-dose long-time li+ therapy may cause severe intoxication of the central nervous system, characterized by cerebellar dysfunction and cognitive decline (emilien & maloteaux, ). lithium interactions with the aβ peptide .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the nmr (figs. and ), fluorescence quenching (fig. ), and cd (fig. ) experiments show that li+ ions display weak interaction with the aβ peptide, where the binding affinity for the li+·aβ complex may be in the millimolar range. the ir and cd results show minor or no effects of li+ ions on the secondary structures of aβ monomers (fig. ) and aβ oligomers (fig. ). the li+ ions may have a small effect on aβ aggregation, with minor perturbations on the morphology of aggregated aβ fibrils (fig. ), and effects on the aβ aggregation kinetics (fig. ; table ) and aβ oligomer stability (fig. ) only at very high li+ concentrations. these results are in line with previous computer modeling results, which suggest small differences between how the monovalent k+, li+, and na+ alkali ions affect aβ oligomerization (huraskin & horn, ). as aβ and aβ have identical n-terminal sequences, the two peptide variants should interact very similarly with li+ ions, which were found to bind to the n-terminal aβ region (fig. b). the weak affinity between aβ and li + ions, and the fact that li+ does not efficiently compete with cu + ions for aβ binding (fig. ), suggest that li+ ions are not coordinated by specific binding ligands. instead, li+ likely engages in non-specific electrostatic interactions with the negatively charged aβ residues, i.e. d , e , d , e , e , and d (which are located in the n-terminal and central regions). the weak binding affinity to aβ peptides is not caused by li + ions being monovalent, as e.g. monovalent ag+ ions display rather strong and specific binding to aβ peptides (wallin et al., ). moreover, divalent pb + and trivalent cr + ions do not bind strongly to aβ, while divalent cu +, mn + and zn + as well as tetravalent pb + ions do (faller, ; abelein et al., ; tiiman et al., ; wallin et al., ; wallin et al., ). thus, aβ/metal interactions are not governed by the charge of the metal ion, but rather by its specific properties, such as ionic radius and electron configuration ( s for li+). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / it is illustrative to compare the aβ interactions with li+ ions to the well-studied interactions with cu + and zn + ions. these two divalent ions display residue-specific interactions with aβ peptides, displaying binding affinities in the micromolar-nanomolar range and strong effects on aβ secondary structure, aggregation, and diffusion (danielsson et al., ; faller, ; lindgren et al., ; abelein et al., ; tiiman et al., ; owen et al., ). aβ binding to cu + and zn + is coordinated mainly by residue-specific interactions with the n-terminal his residues, i.e. h , h , and h (faller, ; lindgren et al., ; abelein et al., ; tiiman et al., ). the biological relevance of cu + and zn + ions in ad pathology is demonstrated by their dysregulation in ad patients (wang et al., ; szabo et al., ), and by them being accumulated in plaques of aβ aggregates in ad brains (beauchemin & kisilevsky, ; lovell et al., ; miller et al., ). during neuronal signaling cu + and zn + ions are released into the synaptic clefts (ayton et al., ), where they may interact with aβ peptides to initiate aβ aggregation (branch et al., ), or modulate the formation and toxicity of aβ oligomers (stefaniak & bal, ; wärmländer et al., ). the current results indicate that li+ ions are not able to compete with cu + or zn + ions for binding to aβ peptides, and should therefore not be able to influence the in vivo effects of cu + and zn + ions on aβ aggregation and toxicity. although high concentrations of li+ showed some effects on aβ aggregation (figs. - ; ), these effects are likely at least partly related to ionic strength effects (abelein et al., ). under physiological ionic strength, no specific interactions are observed between aβ monomers and li + ions (fig. c,d). thus, we conclude that the previously reported possible beneficial effects of li+ on alzheimer’s disease progression (mauer et al., ; sutherland & duthie, ; kerr et al., ; hampel et al., ; velosa et al., ) seem not to be caused by direct interactions between li+ ions and aβ peptides. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / acknowledgments: we thank elizabeth (li) wang for helpful discussions. this work was supported by grants from the swedish alzheimer foundation and the swedish research council to ag, the swedish brain foundation to ag and ab, the magnus bergvall foundation to sw and pr, the ulla-carin lindquist als foundation to pr, and from olle engkvist's foundation, the stockholm region, and knut and alice wallenberg foundation to ab. confict of interest: the authors declare no conflict of interest. references abelein a, abrahams jp, danielsson j, gräslund a, jarvet j, luo j, tiiman a, wärmländer sk ( ). the hairpin conformation of the amyloid beta peptide is an important structural motif along the aggregation pathway. j biol inorg chem ( - ): - . doi: . /s - - - . abelein a, gräslund a, danielsson j ( ). zinc as chaperone-mimicking agent for retardation of amyloid beta peptide fibril formation. proc natl acad sci u s a ( ): - . doi: . /pnas. . abelein a, jarvet j, barth a, gräslund a, danielsson j ( ). ionic strength modulation of the free energy landscape of abeta peptide fibril formation. j am chem soc ( ): - . doi: . /jacs. b . adlard pa, bush ai ( ). metals and alzheimer's disease: how far have we come in the clinic? j alzheimers dis ( ): - . doi: . /jad- . agostinho p, cunha ra, oliveira c ( ). neuroinflammation, oxidative stress and the pathogenesis of alzheimer's disease. curr pharm des ( ): - . doi: . / . al-hilaly yk, williams tl, stewart-parker m, ford l, skaria e, cole m, bucher wg, morris kl, sada aa, thorpe jr, serpell lc ( ). a central role for dityrosine crosslinking of amyloid-beta in alzheimer's disease. acta neuropathol commun : . doi: . / - - - . alies b, renaglia e, rozga m, bal w, faller p, hureau c ( ). cu(ii) affinity for the alzheimer's peptide: tyrosine fluorescence studies revisited. anal chem ( ): - . doi: . /ac u. ayton s, lei p, bush ai ( ). metallostasis in alzheimer's disease. free radic biol med : - . doi: . /j.freeradbiomed. . . . baldassarre m, barth a ( ). pushing the detection limit of infrared spectroscopy for structural analysis of dilute protein samples. analyst ( ): - . doi: . /c an e. barghorn s, nimmrich v, striebinger a, krantz c, p k, janson b, bahr m, schmidt m, bitner rs, harlan j, barlow e, ebert u, hillen h ( ). globular amyloid β-peptide - oligomer – a homogenous and stable neuropathological protein in alzheimer’s disease. j neurochem : – . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / baronio cm, baldassarre m, barth a ( ). insight into the internal structure of amyloid-beta oligomers by isotope-edited fourier transform infrared spectroscopy. phys chem chem phys ( ): - . doi: . /c cp b. barth a ( ). infrared spectroscopy of proteins. biochim biophys acta ( ): - . doi: . /j.bbabio. . . . beauchemin d, kisilevsky r ( ). a method based on icp-ms for the analysis of alzheimer's amyloid plaques. anal chem ( ): - . birch nj ( ). lithium accumulation in bone after oral administration in rat and in man. clin sci mol med ( ): - . doi: . /cs . bitan g, fradinger ea, spring sm, teplow db ( ). neurotoxic protein oligomers--what you see is not always what you get. amyloid ( ): - . doi: . / . branch t, barahona m, dodson ca, ying l ( ). kinetic analysis reveals the identity of abeta- metal complex responsible for the initial aggregation of abeta in the synapse. acs chem neurosci ( ): - . doi: . /acschemneuro. b . broberg k, concha g, engstrom k, lindvall m, grander m, vahter m ( ). lithium in drinking water and thyroid function. environ health perspect ( ): - . doi: . /ehp. . budni j, feijo dp, batista-silva h, garcez ml, mina f, belletini-santos t, krasilchik lr, luz ap, schiavo gl, quevedo j ( ). lithium and memantine improve spatial memory impairment and neuroinflammation induced by beta-amyloid - oligomers in rats. neurobiol learn mem : - . doi: . /j.nlm. . . . caccamo a, oddo s, tran lx, laferla fm ( ). lithium reduces tau phosphorylation but not a beta or working memory deficits in a transgenic model with both plaques and tangles. am j pathol ( ): - . doi: . /ajpath. . . cardillo gm, de-paula vjr, ikenaga eh, costa lr, catanozi s, schaeffer el, gattaz wf, kerr ds, forlenza ov ( ). chronic lithium treatment increases telomere length in parietal cortex and hippocampus of triple-transgenic alzheimer's disease mice. j alzheimers dis ( ): - . doi: . /jad- . cerf e, sarroukh r, tamamizu-kato s, breydo l, derclaye s, dufrene yf, narayanaswami v, goormaghtigh e, ruysschaert jm, raussens v ( ). antiparallel beta-sheet: a signature structure of the oligomeric amyloid beta-peptide. biochem j ( ): - . doi: . /bj . chin-chan m, navarro-yepes j, quintanilla-vega b ( ). environmental pollutants as risk factors for neurodegenerative disorders: alzheimer and parkinson diseases. front cell neurosci : . doi: . /fncel. . . cho ch, urquidi j, singh s, wilse robinson g ( ). thermal offset viscosities of liquid h o, d o, and t o. j. phys. chem. b ( ): - . danielsson j, andersson a, jarvet j, gräslund a ( ). n relaxation study of the amyloid beta- peptide: structural propensities and persistence length. magn reson chem spec no: s - . doi: . /mrc. . danielsson j, jarvet j, damberg p, gräslund a ( ). translational diffusion measured by pfg‐nmr on full length and fragments of the alzheimer aβ( – ) peptide. determination of hydrodynamic radii of random coil peptides of varying length. magnetic resonance in chemistry ( ): s -s . danielsson j, pierattelli r, banci l, gräslund a ( ). high-resolution nmr studies of the zinc- binding site of the alzheimer's amyloid beta-peptide. febs j ( ): - . doi: . /j. - . . .x. decker m, munoz-torrero d ( ). special issue: "molecules against alzheimer". molecules ( ) doi: . /molecules . dell'osso l, del grande c, gesi c, carmassi c, musetti l ( ). a new look at an old drug: neuroprotective effects and therapeutic potentials of lithium salts. neuropsychiatr dis treat : - . doi: . /ndt.s . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / donix m, bauer m ( ). population studies of association between lithium and risk of neurodegenerative disorders. curr alzheimer res ( ): - . doi: . / . duce ja, bush ai, adlard pa ( ). role of amyloid-β–metal interactions in alzheimer’s disease. future neurol ( ): – . edelhoch h ( ). spectroscopic determination of tryptophan and tyrosine in proteins. biochemistry ( ): – . emilien g, maloteaux jm ( ). lithium neurotoxicity at low therapeutic doses hypotheses for causes and mechanism of action following a retrospective analysis of published case reports. acta neurol belg ( ): - . engel t, goni-oliver p, gomez de barreda e, lucas jj, hernandez f, avila j ( ). lithium, a potential protective drug in alzheimer's disease. neurodegener dis ( - ): - . doi: . / . faller p ( ). copper and zinc binding to amyloid-beta: coordination, dynamics, aggregation, reactivity and metal-ion transfer. chembiochem ( ): - . doi: . /cbic. . forlenza ov, de-paula vj, diniz bs ( ). neuroprotective effects of lithium: implications for the treatment of alzheimer's disease and related neurodegenerative disorders. acs chem neurosci ( ): - . doi: . /cn . fornai f, longone p, cafaro l, kastsiuchenka o, ferrucci m, manca ml, lazzeri g, spalloni a, bellio n, lenzi p, modugno n, siciliano g, isidoro c, murri l, ruggieri s, paparelli a ( ). lithium delays progression of amyotrophic lateral sclerosis. proc natl acad sci u s a ( ): - . doi: . /pnas. . frozza rl, lourenco mv, de felice fg ( ). challenges for alzheimer's disease therapy: insights from novel mechanisms beyond memory defects. front neurosci : . doi: . /fnins. . . gade malmos k, blancas-mejia lm, weber b, buchner j, ramirez-alvarado m, naiki h, otzen d ( ). tht : a primer on the use of thioflavin t to investigate amyloid formation. amyloid ( ): - . doi: . / . . . ge w, jakobsson e ( ). systems biology understanding of the effects of lithium on affective and neurodegenerative disorders. front neurosci : . doi: . /fnins. . . ghalebani l, wahlström a, danielsson j, wärmländer sk, gräslund a ( ). ph-dependence of the specific binding of cu(ii) and zn(ii) ions to the amyloid-beta peptide. biochem biophys res commun ( ): - . doi: . /j.bbrc. . . . gibbons gs, lee vmy, trojanowski jq ( ). mechanisms of cell-to-cell transmission of pathological tau: a review. jama neurol ( ): - . doi: . /jamaneurol. . . glasoe pk, long fa ( ). use of glass electrodes to measure acidities in deuterium oxide. j phys chem : – . glenner gg, wong cw ( ). alzheimer's disease: initial report of the purification and characterization of a novel cerebrovascular amyloid protein. biochem biophys res commun ( ): - . goedert m ( ). tau filaments in neurodegenerative diseases. febs lett ( ): - . doi: . / - . . habib a, sawmiller d, li s, xiang y, rongo d, tian j, hou h, zeng j, smith a, fan s, giunta b, mori t, currier g, shytle dr, tan j ( ). lispro mitigates beta-amyloid and associated pathologies in alzheimer's mice. cell death dis ( ): e . doi: . /cddis. . . habib a, shytle rd, sawmiller d, koilraj s, munna sa, rongo d, hou h, borlongan cv, currier g, tan j ( ). comparing the effect of the novel ionic cocrystal of lithium salicylate proline (lispro) with lithium carbonate and lithium salicylate on memory and behavior in female appswe/ps de alzheimer's mice. j neurosci res ( ): - . doi: . /jnr. . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hampel h, lista s, mango d, nistico r, perry g, avila j, hernandez f, geerts h, vergallo a, alzheimer precision medicine i ( ). lithium as a treatment for alzheimer's disease: the systems pharmacology perspective. j alzheimers dis ( ): - . doi: . /jad- . harari f, langeen m, casimiro e, bottai m, palm b, nordqvist h, vahter m ( ). environmental exposure to lithium during pregnancy and fetal size: a longitudinal study in the argentinean andes. environ int : - . doi: . /j.envint. . . . heppner fl, ransohoff rm, becher b ( ). immune attack: the role of inflammation in alzheimer disease. nat rev neurosci ( ): - . doi: . /nrn . huat tj, camats-perna j, newcombe ea, valmas n, kitazawa m, medeiros r ( ). metal toxicity links to alzheimer's disease and neuroinflammation. j mol biol ( ): - . doi: . /j.jmb. . . . huraskin d, horn ahc ( ). alkali ion influence on structure and stability of fibrillar amyloid-beta oligomers. j mol model ( ): . doi: . /s - - - . kerr f, bjedov i, sofola-adesakin o ( ). molecular mechanisms of lithium action: switching the light on multiple targets for dementia using animal models. front mol neurosci : . doi: . /fnmol. . . kisby b, jarrell jt, agar me, cohen ds, rosin er, cahill cm, rogers jt, huang x ( ). alzheimer's disease and its potential alternative therapeutics. j alzheimers dis parkinsonism ( ) doi: . / - . . lee sj, nam e, lee hj, savelieff mg, lim mh ( ). towards an understanding of amyloid-beta oligomers: characterization, toxicity mechanisms, and inhibitors. chem soc rev ( ): - . doi: . /c cs g. lindgren j, segerfeldt p, sholts sb, gräslund a, karlström ae, wärmländer sk ( ). engineered non-fluorescent affibody molecules facilitate studies of the amyloid-beta (abeta) peptide in monomeric form: low ph was found to reduce abeta/cu(ii) binding affinity. j inorg biochem : - . doi: . /j.jinorgbio. . . . liu m, qian t, zhou w, tao x, sang s, zhao l ( ). beneficial effects of low-dose lithium on cognitive ability and pathological alteration of alzheimer's disease transgenic mice model. neuroreport ( ): - . doi: . /wnr. . lovell ma, robertson jd, teesdale wj, campbell jl, markesbery wr ( ). copper, iron and zinc in alzheimer's disease senile plaques. j neurol sci ( ): - . luo j, mohammed i, wärmländer sk, hiruma y, gräslund a, abrahams jp ( ). endogenous polyamines reduce the toxicity of soluble abeta peptide aggregates associated with alzheimer's disease. biomacromolecules ( ): - . doi: . /bm j. luo j, otero jm, yu ch, wärmländer sk, gräslund a, overhand m, abrahams jp ( ). inhibiting and reversing amyloid-beta peptide ( - ) fibril formation with gramicidin s and engineered analogues. chemistry ( ): - . doi: . /chem. . luo j, wärmländer sk, gräslund a, abrahams jp ( ). alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. biochemistry ( ): - . doi: . /bi . luo j, wärmländer sk, gräslund a, abrahams jp ( ). cross-interactions between the alzheimer disease amyloid-beta peptide and other amyloid proteins: a further aspect of the amyloid cascade hypothesis. j biol chem ( ): - . doi: . /jbc.r . . luo j, yu ch, yu h, borstnar r, kamerlin sc, gräslund a, abrahams jp, wärmländer sk ( ). cellular polyamines promote amyloid-beta (abeta) peptide fibrillation and modulate the aggregation pathways. acs chem neurosci ( ): - . doi: . /cn x. machado-vieira r, manji hk, zarate ca, jr. ( ). the role of lithium in the treatment of bipolar disorder: convergent evidence for neurotrophic effects as a unifying hypothesis. bipolar disord suppl : - . doi: . /j. - . . .x. mauer s, vergne d, ghaemi sn ( ). standard and trace-dose lithium: a systematic review of dementia prevention and other behavioral benefits. aust n z j psychiatry ( ): - . doi: . / . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / medic b, stojanovic m, stimec bv, divac n, vujovic ks, stojanovic r, colovic m, krstic d, prostran m ( ). lithium - pharmacological and toxicological aspects: the current state of the art. curr med chem ( ): - . doi: . / . mignarri a, chini e, rufa a, rocchi r, federico a, dotti mt ( ). lithium neurotoxicity mimicking rapidly progressive dementia. j neurol ( ): - . doi: . /s - - -z. miller lm, wang q, telivala tp, smith rj, lanzirotti a, miklossy j ( ). synchrotron-based infrared and x-ray imaging shows focalized accumulation of cu and zn co-localized with beta-amyloid deposits in alzheimer's disease. j struct biol ( ): - . doi: . /j.jsb. . . . mital m, wezynfeld ne, fraczyk t, wiloch mz, wawrzyniak ue, bonna a, tumpach c, barnham kj, haigh cl, bal w, drew sc ( ). a functional role for abeta in metal homeostasis? n- truncation and high-affinity copper binding. angew chem int ed engl ( ): - . doi: . /anie. . modgil s, lahiri dk, sharma vl, anand a ( ). role of early life exposure and environment on neurodegeneration: implications on brain disorders. transl neurodegener : . doi: . / - - - . morris g, berk m ( ). the putative use of lithium in alzheimer's disease. curr alzheimer res ( ): - . doi: . / . morrison ke, dhariwal s, hornabrook r, savage l, burn dj, khoo tk, kelly j, murphy cl, al-chalabi a, dougherty a, leigh pn, wijesekera l, thornhill m, ellis cm, o'hanlon k, panicker j, pate l, ray p, wyatt l, young ca, copeland l, ealing j, hamdalla h, leroi i, murphy c, o'keeffe f, oughton e, partington l, paterson p, rog d, sathish a, sexton d, smith j, vanek h, dodds s, williams tl, steen in, clarke j, eziefula c, howard r, orrell r, sidle k, sylvester r, barrett w, merritt c, talbot k, turner mr, whatley c, williams c, williams j, cosby c, hanemann co, iman i, philips c, timings l, crawford se, hewamadduma c, hibberd r, hollinger h, mcdermott c, mils g, rafiq m, shaw pj, taylor a, waines e, walsh t, addison-jones r, birt j, hare m, majid t ( ). lithium in patients with amyotrophic lateral sclerosis (licals): a phase multicentre, randomised, double-blind, placebo-controlled trial. lancet neurol ( ): - . doi: . /s - ( ) - . nath s, agholme l, kurudenkandy fr, granseth b, marcusson j, hallbeck m ( ). spreading of neurodegenerative pathology via neuron-to-neuron transmission of beta-amyloid. j neurosci ( ): - . doi: . /jneurosci. - . . nečas d, klapetek p ( ). gwyddion: an open-source software for spm data analysis. central european journal of physics : - . doi: https://doi.org/ . . nery lr, eltz ns, hackman c, fonseca r, altenhofen s, guerra hn, freitas vm, bonan cd, vianna mr ( ). brain intraventricular injection of amyloid-beta in zebrafish embryo impairs cognition and increases tau phosphorylation, effects reversed by lithium. plos one ( ): e . doi: . /journal.pone. . nordberg g, fowler b, nordberg m, (eds). ( ). handbook on the toxicology of metals, elsevier. owen mc, gnutt d, gao m, wärmländer skts, jarvet j, gräslund a, winter r, ebbinghaus s, strodel b ( ). effects of in vivo conditions on amyloid aggregation. chem soc rev ( ): - . doi: . /c cs d. pan y, short jl, newman sa, choy khc, tiwari d, yap c, senyschyn d, banks wa, nicolazzo ja ( ). cognitive benefits of lithium chloride in app/ps mice are associated with enhanced brain clearance of beta-amyloid. brain behav immun : - . doi: . /j.bbi. . . . priebe ga, kanzawa mm ( ). reducing the progression of alzheimer's disease in down syndrome patients with micro-dose lithium. med hypotheses : . doi: . /j.mehy. . . prince m, wimo a, guerchet m, ali g-c, wu y-t, prina m ( ). world alzheimer report - the global impact of dementia. london, uk. querfurth hw, laferla fm ( ). alzheimer's disease. n engl j med ( ): - . doi: . /nejmra . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / rangachari v, moore bd, reed dk, sonoda lk, bridges aw, conboy e, hartigan d, rosenberry tl ( ). amyloid-beta( - ) rapidly forms protofibrils and oligomers by distinct pathways in low concentrations of sodium dodecylsulfate. biochemistry ( ): - . doi: . /bi s. regen f, hellmann-regen j, costantini e, reale m ( ). neuroinflammation and alzheimer's disease: implications for microglial activation. curr alzheimer res ( ): - . doi: . / . rocha nkr, themoteo r, brentani h, forlenza ov, de paula vjr ( ). neuronal-glial interaction in a triple-transgenic mouse model of alzheimer's disease: gene ontology and lithium pathways. front neurosci : . doi: . /fnins. . . rockenstein e, torrance m, adame a, mante m, bar-on p, rose jb, crews l, masliah e ( ). neuroprotective effects of regulators of the glycogen synthase kinase- beta signaling pathway in a transgenic model of alzheimer's disease are associated with reduced amyloid precursor protein phosphorylation. j neurosci ( ): - . doi: . /jneurosci. - . . roos pm ( ). osteoporosis in neurodegeneration. j trace elem med biol ( ): - . doi: . /j.jtemb. . . . ryves wj, harwood aj ( ). lithium inhibits glycogen synthase kinase- by competition for magnesium. biochem biophys res commun ( ): - . doi: . /bbrc. . . salata r, klein i ( ). effects of lithium on the endocrine system: a review. j lab clin med ( ): - . sardar sinha m, ansell-schultz a, civitelli l, hildesjo c, larsson m, lannfelt l, ingelsson m, hallbeck m ( ). alzheimer's disease pathology propagation by exosomes containing toxic amyloid- beta oligomers. acta neuropathol ( ): - . doi: . /s - - - . sarroukh r, goormaghtigh e, ruysschaert jm, raussens v ( ). atr-ftir: a "rejuvenated" tool to investigate amyloid proteins. biochim biophys acta ( ): - . doi: . /j.bbamem. . . . selkoe dj, hardy j ( ). the amyloid hypothesis of alzheimer's disease at years. embo mol med ( ): - . doi: . /emmm. . sellers j, tyrer p, whiteley a, banks dc, barer dh ( ). neurotoxic effects of lithium with delayed rise in serum lithium levels. br j psychiatry : - . doi: . /bjp. . . . sengupta u, nilson an, kayed r ( ). the role of amyloid-beta oligomers in toxicity, propagation, and immunotherapy. ebiomedicine : - . doi: . /j.ebiom. . . . sofola-adesakin o, castillo-quan ji, rallis c, tain ls, bjedov i, rogers i, li l, martinez p, khericha m, cabecinha m, bahler j, partridge l ( ). lithium suppresses abeta pathology by inhibiting translation in an adult drosophila model of alzheimer's disease. front aging neurosci : . doi: . /fnagi. . . stefaniak e, bal w ( ). cu(ii) binding properties of n-truncated abeta peptides: in search of biological function. inorg chem ( ): - . doi: . /acs.inorgchem. b . sutherland c, duthie ac ( ). invited commentary on ... lithium treatment and risk for dementia in adults with bipolar disorder. br j psychiatry ( ): - . doi: . /bjp.bp. . . szabo st, harry gj, hayden km, szabo dt, birnbaum l ( ). comparison of metal levels between postmortem brain and ventricular fluid in alzheimer's disease and nondemented elderly controls. toxicol sci ( ): - . doi: . /toxsci/kfv . tiiman a, luo j, wallin c, olsson l, lindgren j, jarvet j, roos pm, sholts sb, rahimipour s, abrahams jp, karlström ae, gräslund a, wärmländer skts ( ). specific binding of cu(ii) ions to amyloid-beta peptides bound to aggregation-inhibiting molecules or sds micelles creates complexes that generate radical oxygen species. j alzheimers dis ( ): - . doi: . /jad- . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / trujillo-estrada l, jimenez s, de castro v, torres m, baglietto-vargas d, moreno-gonzalez i, navarro v, sanchez-varo r, sanchez-mejias e, davila jc, vizuete m, gutierrez a, vitorica j ( ). in vivo modification of abeta plaque toxicity as a novel neuroprotective lithium-mediated therapy for alzheimer's disease pathology. acta neuropathol commun : . doi: . / - - - . velosa j, delgado a, finger e, berk m, kapczinski f, de azevedo cardoso t ( ). risk of dementia in bipolar disorder and the interplay of lithium: a systematic review and meta-analyses. acta psychiatr scand doi: . /acps. . vosough f, barth a (manuscript). characterization of homogeneous and heterogeneous amyloid-β oligomer preparations with biochemical methods and infrared spectroscopy reveals a correlation between infrared spectrum and oligomer size. wallin c, friedemann m, sholts sb, noormagi a, svantesson t, jarvet j, roos pm, palumaa p, gräslund a, wärmländer skts ( ). mercury and alzheimer's disease: hg(ii) ions display specific binding to the amyloid-beta peptide and hinder its fibrillization. biomolecules ( ): . doi: . /biom . wallin c, jarvet j, biverstål h, wärmländer s, danielsson j, gräslund a, abelein a ( ). metal ion coordination delays amyloid-beta peptide self-assembly by forming an aggregation-inert complex. j biol chem ( ): - . doi: . /jbc.ra . . wallin c, kulkarni ys, abelein a, jarvet j, liao q, strodel b, olsson l, luo j, abrahams jp, sholts sb, roos pm, kamerlin sc, gräslund a, wärmländer sk ( ). characterization of mn(ii) ion binding to the amyloid-beta peptide in alzheimer's disease. j trace elem med biol : - . doi: . /j.jtemb. . . . wallin c, sholts sb, Österlund n, luo j, jarvet j, roos pm, ilag l, gräslund a, wärmländer s ( ). alzheimer's disease and cigarette smoke components: effects of nicotine, pahs, and cd(ii), cr(iii), pb(ii), pb(iv) ions on amyloid-beta peptide aggregation. sci rep ( ): . doi: . /s - - - . wang x, wang w, li l, perry g, lee hg, zhu x ( ). oxidative stress and mitochondrial dysfunction in alzheimer's disease. biochim biophys acta ( ): - . doi: . /j.bbadis. . . . wang zx, tan l, wang hf, ma j, liu j, tan ms, sun jh, zhu xc, jiang t, yu jt ( ). serum iron, zinc, and copper levels in patients with alzheimer's disease: a replication study and meta- analyses. j alzheimers dis ( ): - . doi: . /jad- . wen j, sawmiller d, wheeldon b, tan j ( ). a review for lithium: pharmacokinetics, drug design, and toxicity. cns neurol disord drug targets ( ): - . doi: . / . wilson en, do carmo s, welikovitch la, hall h, aguilar lf, foret mk, iulita mf, jia dt, marks ar, allard s, emmerson jt, ducatenzeiler a, cuello ac ( ). np , a microdose lithium formulation, blunts early amyloid post-plaque neuropathology in mcgill-r-thy -app alzheimer-like transgenic rats. j alzheimers dis ( ): - . doi: . /jad- . wärmländer s, tiiman a, abelein a, luo j, jarvet j, söderberg kl, danielsson j, gräslund a ( ). biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. chembiochem ( ): - . doi: . /cbic. . wärmländer skts, Österlund n, wallin c, wu j, luo j, tiiman a, jarvet j, gräslund a ( ). metal binding to the amyloid-β peptides in the presence of biomembranes: potential mechanisms of cell toxicity. journal of biological inorganic chemistry : – xiang j, cao k, dong yt, xu y, li y, song h, zeng xx, ran ly, hong w, guan zz ( ). lithium chloride reduced the level of oxidative stress in brains and serums of app/ps double transgenic mice via the regulation of gsk beta/nrf /ho- pathway. int j neurosci ( ): - . doi: . / . . . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / yu f, zhang y, chuang dm ( ). lithium reduces bace overexpression, beta amyloid accumulation, and spatial learning deficits in mice with traumatic brain injury. j neurotrauma ( ): - . doi: . /neu. . . zhao l, gong n, liu m, pan x, sang s, sun x, yu z, fang q, zhao n, fei g, jin l, zhong c, xu t ( ). beneficial synergistic effects of microdose lithium with pyrroloquinoline quinone in an alzheimer's disease mouse model. neurobiol aging ( ): - . doi: . /j.neurobiolaging. . . . Österlund n, kulkarni ys, misiaszek ad, wallin c, krüger dm, liao q, mashayekhy rad f, jarvet j, strodel b, wärmländer skts, ilag ll, kamerlin scl, gräslund a ( ). amyloid-beta peptide interactions with amphiphilic surfactants: electrostatic and hydrophobic effects. acs chem neurosci ( ): - . doi: . /acschemneuro. b . μm aβ : li +:aβ : li+:aβ : li+:aβ t / [hours] . ± . . ± . . ± . . ± . rmax [hours - ] . ± . . ± . . ± . . ± . table . kinetic parameters for aβ fibril formation, i.e. aggregation half-time (t / ) and maximum aggregation rate (rmax), derived from fitting the curves in fig. to eq. . figures .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . amyloid fibril formation monitored by tht aggregation. samples of μm aβ peptides in mm mes buffer, ph . , were incubated at + °c together with μm thioflavin-t and different concentrations of licl: μm – black; μm – red; μm – green; μm – blue. the circles represent average data points for four replicates, while the solid lines are derived from fitting to eq. . fig. . solid state afm images (a -d ) of aggregates of µm aβ , incubated in mm mes buffer, ph . , for hours at + °c with rpm shaking, together with different concentrations of licl. a. control sample - no licl; b. µm licl; c. µm licl; d. mm licl. the height profile graphs (a -d ) below the afm images correspond to the cross- sections of aβ fibrils shown as white lines in the afm images. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . nmr experiments for interactions between aβ monomers and li + ions. (a) d h- n-hsqc spectra of . μm n-labeled aβ peptides in mm mes buffer, ph . at + °c, recorded for aβ peptides alone (dark sky blue) and in the presence of either μm licl ( : aβ:li ratio; passion red) or . mm licl ( : aβ:li ratio; robin egg blue). (b) relative intensities of aβ residue crosspeaks shown in (a), after addition of licl in : , : , and : aβ:li ratios. (c and d) similar experiments as in a and b, but carried out in the presence of x pbs buffer, and for aβ:li ratios of : , : , and : . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . nmr diffusion data for μm aβ peptides in sodium phosphate buffer, ph . at + °c, recorded both in absence (a) and presence of different li+ concentrations, i.e. μm (b), . mm (c), and . mm (d). fig. . binding curves for the cu +·aβ complex, obtained from the quenching effect of cu + ions on the intrinsic fluorescence of aβ residue y . cucl was titrated to µm aβ in mm mes buffer, ph . at °c, both in the absence (red dots) and the presence (blue triangles) of mm licl. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. . cd spectra of μm aβ peptides at °c in mm sodium phosphate buffer, ph . . spectra were recorded for aβ in buffer only (black), after addition of mm micellar sds (brown), and after subsequent addition of between µm (blue) and µm (gray) of licl. the inset figure shows a close-up of the cd signals for the licl titration in the - nm range. fig. . bn-page gel showing the effects of different concentrations of li+ ions on the formation of sds-stabilized aβ oligomers. lane : monomers prepared in mm naod. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / lanes - : aβ globulomers formed after hours of incubation with . % sds and different licl concentrations. lanes - : aβ oligomers formed after hours of incubation with . % sds and different licl concentrations. fig. . second derivatives of infrared absorbance spectra for µm aβ monomers (black) and µm sds-stabilized aβ oligomers formed in absence (blue) and presence of . mm (red), mm (purple), and mm (green) of licl. the results are shown for aβ globulomers at . % sds (a) and smaller oligomers at . % sds (b). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / the engineered peptide construct ncam -aβ inhibits aggregation of the human prion protein (prp) the engineered peptide construct ncam -aβ inhibits aggregation of the human prion protein (prp) maciej gielnik , lilia zhukova , igor zhukov , astrid gräslund , maciej kozak , , sebastian k.t.s. wärmländer ,* department of macromolecular physics, adam mickiewicz university, poznań, poland; maciejgielnik@amu.edu.pl (m.g.); mkozak@amu.edu.pl (m.k.) institute of biochemistry and biophysics, polish academy of sciences, warszawa, poland; lilia@ibb.waw.pl (l.z.); igor@ibb.waw.pl (i.z.) department of biochemistry and biophysics, arrhenius laboratories, stockholm university, stockholm, sweden; astrid@dbb.su.se (a.g.); seb@dbb.su.se (s.w.) national synchrotron radiation centre solaris, jagiellonian university, kraków, poland. * correspondence: seb@dbb.su.se; tel.: + - - abstract: in prion diseases, the prion protein (prp) becomes misfolded and forms fibrillar aggregates, which are resistant to proteinase degradation and become responsible for prion infectivity and pathology. so far, no drug or treatment procedures have been approved for prion disease treatment. we have previously shown that engineered cell-penetrating peptide constructs can reduce the amount of prion aggregates in infected cells. the molecular mechanisms underlying this effect are however unknown. here, we use atomic force microscopy (afm) imaging to show that the aggregation of the human prp protein can be inhibited by equimolar amounts of the residues long engineered peptide construct ncam -aβ. keywords: creutzfeldt-jakob disease; afm imaging; amyloid; drug design; drug transport; protein-peptide binding .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . introduction prion and amyloid diseases are both characterized by aggregation of misfolded proteins or peptides (jaunmuktane and brandner, ; miller, ; sengupta and udgaonkar, ; verma et al., ), such as the prion (prp) protein (creutzfeldt- jakob disease), α-synuclein (parkinson’s disease), and amyloid-β (aβ) and tau (alzheimer’s disease). many of these proteins and peptides may co-aggregate or at least influence each other’s aggregation (luo et al., , ; ren et al., ; wallin et al., ). factors that modulate the aggregation of one of these proteins, such as small molecules, potential drug compounds, lipids, and metal ions, can often modulate also the aggregation processes of other proteins in this family (ambadi thody et al., ; chemerovski-glikman et al., ; gielnik et al., ; owen et al., ; richman et al., ; robinson and pinheiro, ; wallin et al., ; wärmländer et al., ; wärmländer et al., ; Österlund et al., ). this suggests that the underlying mechanisms may be the same in prion and amyloid diseases (jaunmuktane and brandner, ; jucker and walker, ; miller, ). prion aggregates are however particularly infectious, as they spread between cells (jaunmuktane and brandner, ; jucker and walker, ), and are not degraded by cellular processes such as proteinase digestion (jaunmuktane and brandner, ; löfgren et al., ; söderberg et al., ). the toxic species in amyloid and prion diseases are generally considered to be small toxic oligomeric aggregates (sengupta and udgaonkar, ; verma et al., ), but so far no drugs or treatments that target such aggregates have been approved against prion diseases (hyeon et al., ; lee et al., ; mashima et al., ). potential drug molecules may interfere with oligomer formation in various ways: by reducing production of the protein, by inhibiting its aggregation, by diverting the aggregation pathway(s) towards non-toxic forms, or by reducing the lifetime of the toxic forms, for example by promoting rapid aggregation into larger non-toxic aggregates. we have previously demonstrated anti-prion properties in short peptide constructs (up to residues) with sequences derived from the unprocessed n-termini of mouse and .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / bovine prion proteins: such prp-derived peptides induced lower amounts of prion aggregates resistant to proteinase k in prion-infected cells (löfgren et al., ; söderberg et al., ). the prp-derived peptides consisted of an n-terminal signal peptide segment (different for mouse and bovine prp), together with a conserved positively charged and hydrophobic hexapeptide (kkrpkp) corresponding to the first six residues of the processed prp protein. our earlier studies had shown that peptides with such sequences were able to interact with and penetrate cell membranes (lundberg et al., ; magzoub et al., ; magzoub et al., ; oglecka et al., ). the anti- prion effects of the prp-derived peptides were lost when the kkrpkp hexapeptide was coupled to various peptides with cell-penetrating properties (söderberg et al., ). the anti-prion effects were however retained when kkrpkp was coupled to the signal sequence of the neural cell adhesion molecule- (i.e., ncam - ) (söderberg et al., ). the mouse prp - segment and the ncam - -kkrpkp construct are both amyloidogenic in themselves, as they form amyloid fibrils by self-aggregation (mukundan et al., ; pansieri et al., ). the ncam - -kkrpkp construct was recently shown to inhibit aggregation of the amyloid-β peptide involved in alzheimer’s disease (henning-knechtel et al., ), and to promote in vitro aggregation of the amyloid protein s a (pansieri et al., ), which is involved in amyloid-related and other inflammatory processes (horvath et al., ; wang et al., ; wang et al., ). almost identical results were obtained for a similar amyloidogenic residue-construct, i.e. ncam - -kklvff (from here onwards: ncam -aβ) (pansieri et al., ). the klvff sequence originates from the hydrophobic core (residues - ) of the aβ peptide: this pentapeptide is known to inhibit aggregation of the full-length aβ peptide (tjernberg et al., ). in the ncam -aβ construct, an additional lysine residue was added to the klvff sequence for increased solubility (pansieri et al., ). the molecular properties of the ncam -aβ sequence and its segments are shown in table , including .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / hydrophobicity values calculated according to the wimley-white whole residue hydrophobicity scale (wang et al., ; wimley and white, ). as the ncam -aβ construct inhibits fibrillation of the aβ peptide (henning- knechtel et al., ), but promotes (co-)aggregation of the s a protein (pansieri et al., ), it is unclear how the construct may affect the aggregation of the prp protein (if at all). here, we use atomic force microscopy (afm) imaging to investigate if there is a direct effect of the ncam -aβ construct on the in vitro aggregation of the human prp protein. answering this question might help clarify the mechanisms underlying the previously observed beneficial effects of such peptide constructs on prp infectivity (löfgren et al., ; söderberg et al., ). table . primary sequences and molecular properties of the human prp protein, the ncam -aβ peptide construct, and its parts. protein sequence isoelectric point (pi) molecular weight [g mol- ] net charge at ph theoretical hydrophobicity [kcal mol- ] huprp - uniprot id: p ( aa) . + - ncam - -k- aβ - (ncam - aβ) nh -mlrtkdliwtl fflgtavskklvff- nh . . + - . ncam - (ncam ) nh -mlrtkdliwtl fflgtavs-nh . . + - . kklvff nh -kklvff-cooh . + - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . materials and methods . sample preparation human recombinant prion protein (huprp) was prepared according to a previously published protocol (morillas et al., ; zahn et al., ), albeit with some modifications. the plasmid contained the full-length ( - )huprp protein in fusion with an n-terminal histag, and the thrombin cleavage site was cloned into the prsetb vector (invitrogen, usa). the construct was expressed in e. coli (bl - de ) grown in lb growth medium with µg/ml ampicillin. expression was induced by isopropyl β-d-galactopyranoside (iptg) at od = . . sonication of the lysates was performed in a buffer containing mm tris at ph , mm k hpo , mm glutathione (gsh), m guhcl, and . mm phenylmethane sulfonyl fluoride (pmsf). the solution was centrifuged and the supernatant loaded to ni-nta resin (ge healthcare) and eluted with buffer e ( mm tris at ph . , mm k hpo , and mm imidazole). after washing the resin, the protein was purified with two- step dialysis, initially against mm phosphate buffer with . mm pmsf at ph . , and then against milli-q h o with . mm pmsf. after thrombin cleavage, the pure huprp protein (i.e., with the histag removed) was concentrated using an amicon ultra . ml centrifugal filter (merck & co., usa) with an nmwl cutoff of kda. the final protein concentration was determined by spectrophotometry using an extinction coefficient of ε = m - cm- (gasteiger et al., ). the quality of the final protein was controlled by mass spectrometry (molecular mass da - table ). the ncam -aβ peptide (table ) was purchased as a custom order from the polypeptide group (france) in lyophilized form. the peptide was dissolved in milli-q water, and its concentration was determined via triplicate uv absorption measurements at nm, using a ds- spectrophotometer (denovix, usa) and an extinction coefficient of ε = m - cm- (gasteiger et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . sample incubation the initial buffer in the huprp solution was exchanged to ultrapure water by triplicate diafiltration using an amicon ultra . ml centrifugal filter (merck & co., usa) with an nmwl cutoff of kda. samples of . µm ncam -aβ, . µm ncam -aβ, . µm huprp, and . µm ncam -aβ + . µm huprp were then prepared in mm sodium phosphate buffer, ph . , with mm nacl and m urea. the urea was added as it has previously been shown to promote unfolding of the native prp structure, which is the first step towards aggregation (julien et al., ; swietnicki et al., ). the samples were incubated for hours at ℃ with magnetic stirring at rpm. subsamples were taken out for afm imaging (below) after and hours, respectively. . atomic force microscopy (afm) imaging incubated samples ( μl) were transferred to freshly cleaved mica plates and left to absorb for min, rinsed three times with μl of pure water, and then dried under a gentle flow of nitrogen. afm imaging was performed on a jpk nanowizard (bruker, germany) afm unit using tap al-g cantilevers (ted pella inc., usa) in air intermittent contact mode. the scan rate was . - . hz, the scan area size was μm x μm or μm x μm, with x or x pixel resolution respectively. the afm images were analyzed using the gwyddion . software (necas and klapetek, ). . results and discussion afm images of the aggregation products present in the samples after hours of incubation are shown in figs. a-d. the sample of . µm huprp readily self- aggregated into long fibrils (fig. a) that are approximately - nm thick (judged by their measured height, as width is not accurately represented in afm images). this is somewhat thinner but still in line with the results of previous studies on prp fibrils (terry and wadsworth, ; vazquez-fernandez et al., ; yamaguchi and kuwata, ). a few very large aggregate clumps, over nm high, can also be seen (fig. a). for ncam -aβ, the . µm sample shows small aggregate clumps (fig .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / b). some of them are relatively large, with heights over nm, and may or may not be early stages of fibrillar aggregates (luo et al., ). the . µm ncam -aβ sample shows numerous mature fibrils, about – nm high, together with aggregate clumps (fig. c). the more abundant amount of fibrils for . µm of ncam -aβ confirms earlier results showing that ncam -aβ self-aggregates faster at higher concentrations (pansieri et al., ). figure . afm images of: (a) . µm huprp protein; (b) . µm ncam -aβ peptide; (c) . µm ncam -aβ peptide; and (d) . µm huprp protein + . µm ncam -aβ peptide. all samples in a-d were incubated for hours. (e) . µm huprp protein + . µm ncam -aβ peptide, incubated for hours. all studied samples were incubated at ℃ in mm sodium phosphate buffer, ph . , with mm nacl and m urea, and with magnetic stirring at rpm. the white scale bars are nm. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / interestingly, the sample containing both . µm ncam -aβ and . µm huprp displays no fibrils, but only numerous small aggregate clumps, about nm high (fig. d). even after hours no fibrils can be seen, but the aggregate clumps are then fewer and larger, around – nm high (fig e). as it cannot be ruled out that these small aggregate clumps will eventually form fibrils, it is not possible to tell if fibrillation is completely inhibited, or if the fibrillation rate merely is significantly reduced. nonetheless, the absence of fibrillar aggregates of huprp in the presence of equimolar concentrations of ncam -aβ clearly shows that the peptide construct directly interacts with the huprp protein and interferes with its aggregation. as both molecules are positively charged (table ), it stands to reason that they interact mainly via hydrophobic forces. the aggregation-inhibiting effect of ncam -aβ (fig. ) appears to provide an explanation, at a molecular level, to our earlier observations that such peptide constructs significantly reduce the levels of prion aggregates in prion-infected cells (löfgren et al., ; söderberg et al., ). as both the ncam -aβ peptide and the huprp protein can form amyloid fibrils by themselves (figs. a and c), the two molecules may interact via cross-aggregation, to form smaller non-fibrillar co- aggregates (fig. e) that could be less toxic than pure huprp aggregates (luo et al., , ). if so, the huprp/ncam -aβ interactions would be similar to the interactions between aβ and ncam -aβ (henning-knechtel et al., ). in any case, the huprp/ncam -aβ interactions are very different from the interactions between ncam -aβ and s a protein, where amyloid aggregation is promoted (pansieri et al., ). because the ncam -aβ construct has different effects on different aggregating proteins, it would be interesting to study how this construct might affect the aggregation of other disease-related prion proteins, such as those involved in animal diseases like bovine spongiform encephalopathy (bse), chronic wasting disease in cervids, and sheep scrapie (vazquez-fernandez et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / . conclusions our atomic force microscopy images show that the in vitro aggregation of the human prp protein is inhibited by equimolar amounts of the residues long engineered peptide ncam -aβ. thus, a very likely molecular-level explanation to our previous observation that such cell-penetrating peptide constructs can reduce the amount of prion aggregates in infected cells, is that these peptide constructs directly interact with the prp protein and prevent its fibrillation. funding: the research of mg, iz, lz and mk was supported by an opus research grant ( / /b/st / ) from the national science centre (poland). ag was supported by grants from the swedish research council and from byggmästare engkvist´s foundation. conflicts of interest: the authors declare no conflict of interests. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / references ambadi thody, s., mathew, m.k., udgaonkar, j.b., . mechanism of aggregation and membrane interactions of mammalian prion protein. biochim biophys acta biomembr. chemerovski-glikman, m., rozentur-shkop, e., richman, m., grupi, a., getler, a., cohen, h.y., shaked, h., wallin, c., wärmländer, s.k., haas, e., gräslund, a., chill, j.h., rahimipour, s., . self-assembled cyclic d,l-alpha-peptides as generic conformational inhibitors of the alpha-synuclein aggregation and toxicity: in vitro and mechanistic studies. chemistry , - . gasteiger, e., hoogland, c., gattiker, a., duvaud, s.e., wilkins, m.r., appel, r.d., amos, b., . protein identification and analysis tools on the expasy server, in: walker, j.m. (ed.), the proteomics protocols handbook. humana press, pp. - . gielnik, m., pietralik, z., zhukov, i., szymańska, a., kwiatek, w.m., kozak, m., . prp ( – ) peptide from unstructured n-terminaldomain of human prion protein forms amyloid- likefibrillar structures in the presence of zn +ions. rsc advances , – . henning-knechtel, a., kumar, s., wallin, c., król, s., wärmländer, s., jarvet, j., esposito, g., kirmizialtin, s., gräslund, a., hamilton, a.d., magzoub, m., . designed cell-penetrating peptide inhibitors of amyloid-beta aggregation and cytotoxicity. cell reports physical science , . horvath, i., iashchishyn, i.a., moskalenko, r.a., wang, c., wärmländer, s.k.t.s., wallin, c., gräslund, a., kovacs, g.g., morozova-roche, l.a., . co-aggregation of pro-inflammatory s a with alpha-synuclein in parkinson's disease: ex vivo and in vitro studies. j neuroinflammation , . hyeon, j.w., noh, r., choi, j., lee, s.m., lee, y.s., an, s.s.a., no, k.t., lee, j., . bmd - , a novel benzoxazole derivative, shows a potent anti-prion activity and prolongs the mean survival in an animal model of prion disease. exp neurobiol , - . jaunmuktane, z., brandner, s., . the role of prion-like mechanisms in neurodegenerative diseases. neuropathol appl neurobiol. jucker, m., walker, l.c., . propagation and spread of pathogenic protein assemblies in neurodegenerative diseases. nat neurosci , - . julien, o., chatterjee, s., thiessen, a., graether, s.p., sykes, b.d., . differential stability of the bovine prion protein upon urea unfolding. protein sci , - . kristensen, m., birch, d., morck nielsen, h., . applications and challenges for use of cell- penetrating peptides as delivery vectors for peptide and protein cargos. int j mol sci . lee, s.m., kim, s.s., kim, h., kim, s.y., . therpa v : an update of a small molecule database related to prion protein regulation and prion disease progression. prion , - . lundberg, p., magzoub, m., lindberg, m., hallbrink, m., jarvet, j., eriksson, l.e., langel, u., gräslund, a., . cell membrane translocation of the n-terminal ( - ) part of the prion protein. biochem biophys res commun , - . luo, j., wärmländer, s.k., gräslund, a., abrahams, j.p., . alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. biochemistry , - . luo, j., wärmländer, s.k., gräslund, a., abrahams, j.p., . reciprocal molecular interactions between the abeta peptide linked to alzheimer's disease and insulin linked to diabetes mellitus type ii. acs chem neurosci , - . luo, j., wärmländer, s.k., gräslund, a., abrahams, j.p., . cross-interactions between the alzheimer disease amyloid-beta peptide and other amyloid proteins. a further aspect of the amyloid cascade hypothesis. j biol chem , . löfgren, k., wahlström, a., lundberg, p., langel, u., gräslund, a., bedecs, k., . antiprion properties of prion protein-derived cell-penetrating peptides. faseb j , - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / magzoub, m., oglecka, k., pramanik, a., eriksson, g.l.e., gräslund, a., . membrane perturbation effects of peptides derived from the n-termini of unprocessed prion proteins. biochim biophys acta , - . magzoub, m., sandgren, s., lundberg, p., oglecka, k., lilja, j., wittrup, a., eriksson, g.l.e., langel, u., belting, m., gräslund, a., . n-terminal peptides from unprocessed prion proteins enter cells by macropinocytosis. biochem biophys res commun , - . mashima, t., lee, j.h., kamatari, y.o., hayashi, t., nagata, t., nishikawa, f., nishikawa, s., kinoshita, m., kuwata, k., katahira, m., . development and structural determination of an anti- prp(c) aptamer that blocks pathological conformational conversion of prion protein. sci rep , . miller, g., . neurodegeneration. could they all be prion diseases? science , - . morillas, m., swietnicki, w., gambetti, p., surewicz, w.k., . membrane environment alters the conformational structure of the recombinant human prion protein. j biol chem , - . mukundan, v., maksoudian, c., vogel, m.c., chehade, i., katsiotis, m.s., alhassan, s.m., magzoub, m., . cytotoxicity of prion protein-derived cell-penetrating peptides is modulated by ph but independent of amyloid formation. arch biochem biophys , - . necas, d., klapetek, p., . gwyddion: an open-source software for spm data analysis. central european journal of physics , - . oglecka, k., lundberg, p., magzoub, m., eriksson, g.l.e., langel, u., gräslund, a., . relevance of the n-terminal nls-like sequence of the prion protein for membrane perturbation effects. biochim biophys acta , - . owen, m.c., gnutt, d., gao, m., wärmländer, s.k.t.s., jarvet, j., gräslund, a., winter, r., ebbinghaus, s., strodel, b., . effects of in vivo conditions on amyloid aggregation. chem soc rev , - . pansieri, j., ostojic, l., iashchishyn, i.a., magzoub, m., wallin, c., wärmländer, s., gräslund, a., nguyen ngoc, m., smirnovas, v., svedruzic, z., morozova-roche, l.a., . pro- inflammatory s a protein aggregation promoted by ncam peptide constructs. acs chem biol , - . ren, b., zhang, y., zhang, m., liu, y., zhang, d., gong, x., feng, z., tang, j., chang, y., zheng, j., . fundamentals of cross-seeding of amyloid proteins: an introduction. j mater chem b , - . richman, m., wilk, s., chemerovski, m., wärmländer, s.k., wahlström, a., gräslund, a., rahimipour, s., . in vitro and mechanistic studies of an antiamyloidogenic self-assembled cyclic d,l- alpha-peptide architecture. j am chem soc , - . robinson, p.j., pinheiro, t.j., . phospholipid composition of membranes directs prions down alternative aggregation pathways. biophys j , - . santuccione, a., sytnyk, v., leshchyns'ka, i., schachner, m., . prion protein recruits its neuronal receptor ncam to lipid rafts to activate p fyn and to enhance neurite outgrowth. j cell biol , - . schmitt-ulms, g., legname, g., baldwin, m.a., ball, h.l., bradon, n., bosque, p.j., crossin, k.l., edelman, g.m., dearmond, s.j., cohen, f.e., prusiner, s.b., . binding of neural cell adhesion molecules (n-cams) to the cellular prion protein. j mol biol , - . sengupta, i., udgaonkar, j.b., . structural mechanisms of oligomer and amyloid fibril formation by the prion protein. chem commun (camb) , - . swietnicki, w., morillas, m., chen, s.g., gambetti, p., surewicz, w.k., . aggregation and fibrillization of the recombinant human prion protein huprp - . biochemistry , - . söderberg, k.l., guterstam, p., langel, u., gräslund, a., . targeting prion propagation using peptide constructs with signal sequence motifs. arch biochem biophys , - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / terry, c., wadsworth, j.d.f., . recent advances in understanding mammalian prion structure: a mini review. front mol neurosci , . tjernberg, l.o., näslund, j., lindqvist, f., johansson, j., karlström, a.r., thyberg, j., terenius, l., nordstedt, c., . arrest of beta-amyloid fibril formation by a pentapeptide ligand. j biol chem , - . vazquez-fernandez, e., young, h.s., requena, j.r., wille, h., . the structure of mammalian prions and their aggregates. int rev cell mol biol , - . verma, m., vats, a., taneja, v., . toxic species in amyloid disorders: oligomers or mature fibrils. ann indian acad neurol , - . wallin, c., hiruma, y., wärmländer, s., huvent, i., jarvet, j., abrahams, j.p., gräslund, a., lippens, g., luo, j., . the neuronal tau protein blocks in vitro fibrillation of the amyloid-beta (abeta) peptide at the oligomeric stage. j am chem soc , - . wallin, c., luo, j., jarvet, j., wärmländer, s.k.t.s., gräslund, a., . the amyloid-b peptide in amyloid formation processes: interactions with blood proteins and naturally occurring metal ions. israel journal of chemistry , - . wang, c., iashchishyn, i.a., kara, j., fodera, v., vetri, v., sancataldo, g., marklund, n., morozova- roche, l.a., . proinflammatory and amyloidogenic s a induced by traumatic brain injury in mouse model. neurosci lett , - . wang, c., klechikov, a.g., gharibyan, a.l., wärmländer, s.k.t.s., jarvet, j., zhao, l., jia, x., narayana, v.k., shankar, s.k., olofsson, a., brännström, t., mu, y., gräslund, a., morozova-roche, l.a., . the role of pro-inflammatory s a in alzheimer's disease amyloid- neuroinflammatory cascade. acta neuropathol , - . wang, g., li, x., wang, z., . apd : the antimicrobial peptide database as a tool for research and education. nucleic acids res , d - . wimley, w.c., white, s.h., . experimentally determined hydrophobicity scale for proteins at membrane interfaces. nat struct biol , - . wärmländer, s.k.t.s., tiiman, a., abelein, a., luo, j., jarvet, j., söderberg, k.l., danielsson, j., gräslund, a., . biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. chembiochem , - . wärmländer, s.k.t.s., Österlund, n., wallin, c., wu, j., luo, j., tiiman, a., jarvet, j., gräslund, a., . metal binding to the amyloid-beta peptides in the presence of biomembranes: potential mechanisms of cell toxicity. j biol inorg chem , - . yamaguchi, k.i., kuwata, k., . formation and properties of amyloid fibrils of prion protein. biophys rev , - . zahn, r., von schroetter, c., wüthrich, k., . human prion proteins expressed in escherichia coli and purified by high-affinity column refolding. febs lett , - . Österlund, n., kulkarni, y.s., misiaszek, a.d., wallin, c., kruger, d.m., liao, q., mashayekhy rad, f., jarvet, j., strodel, b., wärmländer, s.k.t.s., ilag, l.l., kamerlin, s.c.l., gräslund, a., . amyloid-beta peptide interactions with amphiphilic surfactants: electrostatic and hydrophobic effects. acs chem neurosci , - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / -gingerol interferes with amyloid-beta (aβ) peptide aggregation -gingerol interferes with amyloid-beta (aβ) peptide aggregation elina berntsson , suman paul , sabrina b. sholts , jüri jarvet , , andreas barth , astrid gräslund , sebastian k. t. s. wärmländer ,* department of biochemistry and biophysics, stockholm university, sweden. department of anthropology, national museum of natural history, smithsonian institution, washington, dc, usa. the national institute of chemical physics and biophysics, tallinn, estonia. * correspondence: seb@dbb.su.se; tel.: + - - .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / abstract alzheimer’s disease (ad) is the most prevalent age-related cause of dementia. ad affects millions of people worldwide, and to date there is no cure. the pathological hallmark of ad brains is deposition of amyloid plaques, which mainly consist of amyloid-β (aβ) peptides, commonly or residues long, that have aggregated into amyloid fibrils. intermediate aggregates in the form of soluble aβ oligomers appear to be highly neurotoxic. cell and animal studies have previously demonstrated positive effects of the molecule -gingerol on ad pathology. gingerols are the main active constituents of the ginger root, which in many cultures is a traditional nutritional supplement for memory enhancement. here, we use biophysical experiments to characterize in vitro interactions between -gingerol and aβ peptides. our experiments with atomic force microscopy imaging, and nuclear magnetic resonance and thioflavin-t fluorescence spectroscopy, show that the hydrophobic -gingerol molecule interferes with formation of aβ aggregates, but does not interact with aβ monomers. thus, together with its favourable toxicity profile, -gingerol appears to display many of the desired properties of an anti-ad compound. key words: alzheimer’s disease; amyloid aggregation; neurodegeneration; ginger; therapeutics; dementia .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / introduction alzheimer’s disease (ad) is a progressive and currently incurable neurodegenerative disorder, and the leading cause of age-related dementia worldwide (frozza et al., ; querfurth and laferla, ). although ad brains typically display signs of neuroinflammation and oxidative stress (agostinho et al., ; regen et al., ; wang et al., b), the main characteristic lesions in ad brains are extracellular amyloid plaques (querfurth and laferla, ; selkoe and hardy, ), which mainly consist of insoluble fibrillar aggregates of amyloid-β (aβ) peptides (querfurth and laferla, ). the aβ peptides comprise - residues and are intrinsically disordered in aqueous solution. they have limited solubility in water due to the hydrophobicity of the central and c-terminal segments, which may fold into a hairpin conformation upon aggregation (abelein et al., ; baronio et al., ). the charged n-terminal segment of aβ peptides is hydrophilic and interacts readily with cationic molecules and metal ions (luo et al., a; owen et al., ; wärmländer et al., ). the aβ fibrils and plaques that characterize ad neuropathology are the end- products of aβ aggregation processes (owen et al., ; selkoe and hardy, ) that involve extra- and/or intracellular formation of intermediate, soluble, and likely neurotoxic aβ oligomers (luo et al., b; sengupta et al., ) which may transfer from neuron to neuron via e.g. exosomes (sardar sinha et al., ). oligomers of aβ appear to be the most cell-toxic species (sengupta et al., ). the formation of aβ oligomers is influenced by interactions with various entities such as cellular membranes, small molecules, other proteins, and metal ions (luo et al., a, b; owen et al., ; wärmländer et al., ; Österlund et al., a). significant effort has been put into finding suitable molecules – i.e., drug candidates - that may modulate the aβ aggregation processes (leshem et al., ; luo et al., ; richman et al., ), but so far no drug has been approved (frozza et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / some investigations of potential anti-ad substances have focused on natural plant compounds, such as gingerols, which are phenolic phytochemical compounds present in the subterranean stem, or rhizome, of angiosperms of the ginger (zingiberaceae) family (wang et al., a). consumed worldwide as a spice and herbal medicine, the rhizome of ginger (zingiber officinale) has demonstrated anti-inflammatory, antioxidant, antiemetic, analgesic, and antimicrobial effects (sharifi-rad et al., ). ginger is a common ingredient in traditional healthy diets in many cultures (iranshahy and javadi, ; khodaie and sadeghpoor, ). according to arabian folk wisdom, ginger improves memory and enhances cognition (saenghong et al., ). gingerols are generally considered to be safe for humans (kaul and joshi, ; wang et al., a). yet, they are cytotoxic towards blood cancer and lung cancer cells (de lima et al., ; semwal et al., ), and in vitro studies have demonstrated positive effects also on bowel (jeong et al., ), breast (lee et al., ), ovary (rhode et al., ), and pancreas cancer (park et al., ). the major pharmacologically-active variant is -gingerol, which has been associated with the prevention and treatment of neurodegenerative diseases such as ad (choi et al., ; jeong et al., ; mohd sahardi and makpol, ; wang et al., a). its chemical structure is shown in fig. . the anti-oxidant and anti- inflammatory properties of -gingerol are potentially useful against ad (mohd sahardi and makpol, ), which may explain why -gingerol has been reported to reduce markers for neuroinflammation and oxidative stress, as well as decrease aβ levels, in mice and cell ad models (halawany et al., ; zeng et al., ). little is however known about the molecular mechanisms by which -gingerol exerts its positive effects on the ad pathology models. for example, interactions between gingerols and aβ peptides have not been studied at the molecular level. here, we use biophysical techniques – liquid-phase fluorescence and nuclear magnetic resonance (nmr) spectroscopy together with solid-state atomic force microscopy (afm) - to investigate possible in vitro interactions between -gingerol and aβ peptides, and how such interactions may affect the aβ aggregation and amyloid formation processes. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . chemical structure for the hydrophobic plant metabolite -gingerol. mw = . g/mol. materials and methods reagents and sample preparation -gingerol was purchased as a powder from sigma-aldrich inc. (usa), and dissolved in dmso (dimethyl sulfoxide). recombinant unlabeled or uniformly n-labeled aβ peptides, with the primary sequence daefr hdsgy evhhq klvff aedvg snkga iiglm vggvv , were purchased lyophilized from alexotech ab (umeå, sweden). the peptides were stored at - °c until used. the peptide concentration was determined by weight, and the peptide samples were dissolved to monomeric form immediately before each measurement. in brief, the peptides were dissolved in mm sodium hydroxide, ph , at a mg/ml concentration and sonicated in an ice-bath for at least three minutes to avoid having pre-formed aggregates in the peptide solutions. the peptide solution was then further diluted in mm buffer of either sodium phosphate or mes ( -[n- morpholino]ethanesulfonic acid) at ph . . all sample preparation steps were performed on ice. tht fluorescence monitoring aβ aggregation kinetics to monitor the effect of -gingerol on aβ aggregation kinetics, µm monomeric aβ peptides were incubated in mm mes buffer ph . in the presence of five different concentrations of -gingerol ( , , , , and .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / µm) together with dmso ( . %, . %, %, % and %; vol/vol). additionally, a control sample without -gingerol but containing % dmso was prepared. all samples contained μm thioflavin t (tht), which is a benzothiazole dye that displays increased fluorescence intensity when bound to amyloid aggregates (gade malmos et al., ). the tht dye was excited at nm, and the fluorescence emission at nm was measured every five minutes in a -well plate in a fluostar omega microplate reader (bmg labtech, germany). the sample volume in each well was µl, four replicates per condition were measured, the temperature was + °c, and each five-minute cycle involved seconds of shaking at rpm. the assay was repeated three times. even though the tht fluorescence signal reached its maximum value after about seven hours, the incubation in the microplate reader continued for hours to allow the samples to aggregate into mature fibrils that could be observed with afm imaging (below). to derive parameters for the aggregation kinetics, the tht fluorescence curves were fitted to the sigmoidal equation : (eq. ) where f and f∞ are the intercepts of the initial and final fluorescence intensity baselines, m and m∞ are the slopes of the initial and final baselines, τ½ is the time needed to reach halfway through the elongation phase (i.e., aggregation half-time), and τelon is the elongation time constant (gade malmos et al., ). the apparent maximum rate constant for fibrillar growth, rmax, is defined as /τelon. atomic force microscopy (afm) imaging of aβ fibrils samples for afm imaging were taken from the samples used in the tht fluorescence measurements, after h of incubation. afm images were recorded for the two control samples of µm aβ in mes buffer, with and without % added dmso, and for the three samples of µm aβ together with µm, µm, and .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / µm of -gingerol. droplets of µl incubated sample were placed on fresh silicon wafers (siegert wafer gmbh, germany) and allowed to sit for minutes. next, µl milli-q water was added to the droplets, and all excess fluid was removed immediately with a lint-free wipe. the wafers were left to dry in a covered container to protect from dust, and afm images were recorded on the same day. a neasnom scattering-type near-field optical instrument (neaspec gmbh, germany) was used to collect the afm images under tapping mode (Ω: khz, tapping amplitude - nm) using pt/ir-coated monolithic arrow-ncpt si tip (nanoandmore gmbh, germany) with tip radius < nm. images were acquired on . x . µm scan-areas ( x -pixel size) under optimal scan-speed (i.e., . ms/pixel), and both topographic and mechanical phase images were recorded. images were minimally processed using the gwyddion software where a basic plane levelling was performed (nečas and klapetek, ). nuclear magnetic resonance (nmr) spectroscopy an avance mhz nmr spectrometer (bruker inc., usa) equipped with a cryogenic probe was used to record d h- n-hsqc spectra at + °c of . μm monomeric n-labeled aβ peptides ( μl), either in only mm sodium phosphate buffer at ph . ( / h o/d o), or in phosphate buffer together with mm sds (sodium dodecyl sulphate) detergent. as the critical micelle concentration (cmc) for sds is around mm (Österlund et al., b), most of the sds was present as micelles. both samples were titrated, first with additions of pure dmso, and then by -gingerol dissolved in dmso. the nmr data was processed with the topspin version . . software, and the aβ hsqc crosspeak assignment in buffer (danielsson et al., ) and in sds micelles (jarvet et al., ) is known from previous work. results tht fluorescence kinetics .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / fig. shows tht fluorescence intensity curves for µm aβ peptides, incubated in the presence of varying concentrations of -gingerol and dmso. these curves reflect the formation of amyloid aggregates, and they all display a generally sigmoidal shape. fitting eq. to the curves produces the kinetic parameters τ½, rmax, and τlag (table ). addition of dmso alone, which was used to dissolve the - gingerol, has minor effects on the aggregation kinetics, i.e. by slightly increasing the lag time from . to . hrs and decreasing the aggregation half time from . to . hrs (fig. , table ). with -gingerol, some additions produce aggregation kinetics that differ from the control samples. for example, addition of µm -gingerol appears to slow down the aggregation (τlag = . h; τ½ = . h), while addition of µm -gingerol appears to speed up the aggregation (τlag = . h; τ½ = . h). there is however variation in these measurements, and there is no overall trend of faster or slower kinetics for the series of -gingerol additions. thus, these data indicate that - gingerol has no systematic effect on aβ aggregation or amyloid formation. figure . tht fluorescence curves showing the aggregation kinetics of µm aβ in mm mes buffer, ph . , at °c. black: buffer only; red: % dmso; blue: µm -gingerol; pink: µm -gingerol; green: µm -gingerol; dark blue: .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / µm -gingerol; and purple: µm -gingerol. average curves from four replicates are shown. table . kinetic parameters (τ½, τlag, and rmax) for fibril formation of µm aβ peptides, derived from fitting eq. to the tht fluorescence curves shown in fig. . aβ control in buffer aβ control in % dmso + µm -gingerol + µm -gingerol + µm -gingerol + µm -gingerol + µm -gingerol τ½ (hours) . ± . . ± . . ± . . ± . . ± . . ± . . ± . τlag (hours) . ± . . ± . . ± . . ± . . ± . . ± . . ± . rmax (hours- ) . ± . . ± . . ± . . ± . . ± . . ± . . ± . afm imaging afm images were recorded for some of the samples used in the tht fluorescence measurements, i.e. the two control samples of µm aβ peptides in buffer with and without % dmso, and the samples with additions of µm, µm, and µm of -gingerol (fig. ). these samples were incubated for h, to ensure aggregation into the mature elongated fibrils seen in fig. a. incubation in the presence of % dmso produced similar fibrils, although together with small non- fibrillar clumps (fig. b). somewhat similar results, although with even more clumps, were obtained for the samples incubated together with and µm -gingerol, which also contained . % and . % dmso, respectively (figs. c and d). the sample with µm of -gingerol and % dmso does however display a different morphology, as it clearly contains more amorphous clumps than elongated fibrils (fig. e). when evaluating these samples, it is a confounding factor that dmso appears to slightly affect the fibril formation. the sample with µm -gingerol however contains % dmso (fig. e), i.e. the same amount of dmso as the control sample with dmso (fig. b). thus, the different morphologies of the aβ aggregates in these two samples is clearly caused by the added -gingerol and not by the dmso alone. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . afm images showing aggregates of µm aβ peptide. (a) aβ in buffer. (b) aβ in dmso. (c) aβ and µm -gingerol in dmso, (d) aβ and µm -gingerol in dmso, (e) aβ and µm -gingerol in dmso. top row: height profiles. bottom row: mechanical phase images. nmr spectroscopy nmr experiments were conducted to investigate possible molecular interactions between -gingerol and the monomeric aβ peptide. the finger-print region of the h, n-hsqc spectrum of μm monomeric n-labeled aβ peptide is shown in fig. (blue spectrum), both for aβ in buffer and for aβ bound to sds micelles. the sds micelles were here used as a simple model for a membrane environment that is suitable for nmr studies (Österlund et al., a; Österlund et al., b). in both environments, addition of dmso ( % in the buffer sample and % in the sample with sds micelles) induces chemical shifts of most crosspeaks (fig. , red spectra). this is consistent with previous nmr studies of aβ in dmso (wallin et al., ). addition of -gingerol dissolved in dmso increased the dmso concentration to % in the buffer sample and to % in the sample with sds micelles. this addition induces chemical shift changes for the nmr crosspeaks that are perfectly consistent with the changes induced by dmso alone (fig. , orange spectra). this shows that -gingerol does not have any strong interaction of its own with monomeric aβ , neither in aqueous solution nor in a membrane environment. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / figure . d nmr h, n-hsqc spectra recorded at + °c for μm monomeric aβ peptide in mm sodium phosphate buffer, ph . , for (a) aβ in buffer alone, and (b) aβ bound to micelles of mm sds. the spectra were recorded before (blue) and after addition of dmso (red), and then after addition of . mm - gingerol in dmso. discussion given the ancient history and cultural importance of ginger in many parts of the world (iranshahy and javadi, ; khodaie and sadeghpoor, ; saenghong et al., ), it is desirable to understand the molecular mechanisms behind its proposed benefits to human health. such mechanistic investigations may also expand ethnomedical research, which often focuses on population-level medical effects and exposure/uptake levels (sholts et al., ; wärmländer et al., ). .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / here, we show that -gingerol interferes with the aggregation mechanisms of aβ peptide aggregation, by inducing aggregation into amorphous clumps rather than into elongated fibrils (fig. ). our tht fluorescence assays show that -gingerol has no systematic effect on the kinetics of the aβ aggregation process, and that approximately the same amount of amyloid aggregates is formed with and without - gingerol (fig. ). from a medical perspective, however, the most important aspect of aβ aggregation may not be the amount or speed of aggregation, but rather the properties of the aggregates. the neuronal death in ad appears to be mainly caused by small oligomeric aβ aggregates of unknown composition and structure (luo et al., b; sardar sinha et al., ; sengupta et al., ) that might disrupt cell membranes (wärmländer et al., ). thus, the observed interference of -gingerol with the aβ aggregation processes could provide a molecular explanation of the previously observed beneficial effects of gingerols on cell and animal models of ad pathology (choi et al., ; halawany et al., ; jeong et al., ; mohd sahardi and makpol, ; wang et al., a; zeng et al., ). the nmr results show that -gingerol does not interact with monomeric aβ , neither in aqueous solution nor in membrane-mimicking micelles. thus, interaction appears to take place only when oligomers or larger aggregates have formed. this is not unreasonable, as aβ oligomers are considered to be more hydrophobic than the amphiphilic aβ monomers (wärmländer et al., ), and thus more likely to interact with the hydrophobic -gingerol molecules. in fact, the ideal ad drug is a molecule that interferes with toxic aβ aggregates but not with the aβ monomers, as the latter may have beneficial biological functions in their non-aggregated form (dominy et al., ; frozza et al., ; querfurth and laferla, ; rajendran and annaert, ). as a molecule that is non-toxic (kaul and joshi, ), easy to produce and administer, and small enough to easily pass through the blood-brain-barrier, - gingerol has suitable properties for use as a drug. this study suggests that -gingerol may be used to combat ad by interfering with the aggregation of aβ peptides. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / conflict of interest the authors declare no conflicts of interest. acknowledgments we thank teodor svantesson and georgia pilkington for helpful discussions and advice. references abelein, a., abrahams, j. p., danielsson, j., gräslund, a., jarvet, j., luo, j., tiiman, a. and wärmländer, s. k. ( ). the hairpin conformation of the amyloid beta peptide is an important structural motif along the aggregation pathway. j biol inorg chem , - . agostinho, p., cunha, r. a. and oliveira, c. ( ). neuroinflammation, oxidative stress and the pathogenesis of alzheimer's disease. curr pharm des , - . baronio, c. m., baldassarre, m. and barth, a. ( ). insight into the internal structure of amyloid- beta oligomers by isotope-edited fourier transform infrared spectroscopy. phys chem chem phys , - . choi, j. g., kim, s. y., jeong, m. and oh, m. s. ( ). pharmacotherapeutic potential of ginger and its compounds in age-related neurological disorders. pharmacol ther , - . danielsson, j., andersson, a., jarvet, j. and gräslund, a. ( ). n relaxation study of the amyloid beta-peptide: structural propensities and persistence length. magn reson chem spec no, s - . de lima, r. m. t., dos reis, a. c., de menezes, a. p. m., santos, j. v. o., filho, j., ferreira, j. r. o., de alencar, m., da mata, a., khan, i. n., islam, a., uddin, s. j., ali, e. s., islam, m. t., tripathi, s., mishra, s. k., mubarak, m. s. and melo-cavalcante, a. a. c. ( ). protective and therapeutic potential of ginger (zingiber officinale) extract and [ ]-gingerol in cancer: a comprehensive review. phytother res , - . dominy, s. s., lynch, c., ermini, f., benedyk, m., marczyk, a., konradi, a., nguyen, m., haditsch, u., raha, d., griffin, c., holsinger, l. j., arastu-kapur, s., kaba, s., lee, a., ryder, m. i., potempa, b., mydel, p., hellvard, a., adamowicz, k., hasturk, h., walker, g. d., reynolds, e. c., faull, r. l. m., curtis, m. a., dragunow, m. and potempa, j. ( ). porphyromonas gingivalis in alzheimer's disease brains: evidence for disease causation and treatment with small- molecule inhibitors. sci adv , eaau . frozza, r. l., lourenco, m. v. and de felice, f. g. ( ). challenges for alzheimer's disease therapy: insights from novel mechanisms beyond memory defects. front neurosci , . gade malmos, k., blancas-mejia, l. m., weber, b., buchner, j., ramirez-alvarado, m., naiki, h. and otzen, d. ( ). tht : a primer on the use of thioflavin t to investigate amyloid formation. amyloid , - . halawany, a. m. e., sayed, n. s. e., abdallah, h. m. and dine, r. s. e. ( ). protective effects of gingerol on streptozotocin-induced sporadic alzheimer's disease: emphasis on inhibition of beta-amyloid, cox- , alpha-, beta - secretases and aph a. sci rep , . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / iranshahy, m. and javadi, b. ( ). diet therapy for the treatment of alzheimer’s disease in view of traditional persian medicine: a review. iranian journal of basic medical sciences , - . jarvet, j., danielsson, j., damberg, p., oleszczuk, m. and gräslund, a. ( ). positioning of the alzheimer abeta( - ) peptide in sds micelles using nmr and paramagnetic probes. j biomol nmr , - . jeong, c. h., bode, a. m., pugliese, a., cho, y. y., kim, h. g., shim, j. h., jeon, y. j., li, h., jiang, h. and dong, z. ( ). [ ]-gingerol suppresses colon cancer growth by targeting leukotriene a hydrolase. cancer res , - . jeong, j. k., moon, m. h., park, y. g., lee, j. h., lee, y. j., seol, j. w. and park, s. y. ( ). gingerol- induced hypoxia-inducible factor alpha inhibits human prion peptide-mediated neurotoxicity. phytother res , - . kaul, p. n. and joshi, b. s. ( ). alternative medicine: herbal drugs and their critical appraisal - part ii. in progress in drug research (e. jucker, ed., vol. , pp. - . birkhäuser, basel, switzerland. khodaie, l. and sadeghpoor, o. ( ). ginger from ancient times to the new outlook. jundishapur j nat pharm prod , e . lee, h. s., seo, e. y., kang, n. e. and kim, w. k. ( ). [ ]-gingerol inhibits metastasis of mda-mb- human breast cancer cells. j nutr biochem , - . leshem, g., richman, m., lisniansky, e., antman-passig, m., habashi, m., gräslund, a., wärmländer, s. k. t. s. and rahimipour, s. ( ). photoactive chlorin e is a multifunctional modulator of amyloid-beta aggregation and toxicity via specific interactions with its histidine residues. chem sci , - . luo, j., mohammed, i., wärmländer, s. k., hiruma, y., gräslund, a. and abrahams, j. p. ( a). endogenous polyamines reduce the toxicity of soluble abeta peptide aggregates associated with alzheimer's disease. biomacromolecules , - . luo, j., otero, j. m., yu, c. h., wärmländer, s. k., gräslund, a., overhand, m. and abrahams, j. p. ( ). inhibiting and reversing amyloid-beta peptide ( - ) fibril formation with gramicidin s and engineered analogues. chemistry , - . luo, j., wärmländer, s. k., gräslund, a. and abrahams, j. p. ( b). alzheimer peptides aggregate into transient nanoglobules that nucleate fibrils. biochemistry , - . luo, j., wärmländer, s. k., gräslund, a. and abrahams, j. p. ( a). cross-interactions between the alzheimer disease amyloid-beta peptide and other amyloid proteins: a further aspect of the amyloid cascade hypothesis. j biol chem , - . luo, j., wärmländer, s. k., gräslund, a. and abrahams, j. p. ( b). reciprocal molecular interactions between the abeta peptide linked to alzheimer's disease and insulin linked to diabetes mellitus type ii. acs chem neurosci , - . mohd sahardi, n. f. n. and makpol, s. ( ). ginger (zingiber officinale roscoe) in the prevention of ageing and degenerative diseases: review of current evidence. evid based complement alternat med , . nečas, d. and klapetek, p. ( ). gwyddion: an open-source software for spm data analysis. central european journal of physics , - . owen, m. c., gnutt, d., gao, m., wärmländer, s. k. t. s., jarvet, j., gräslund, a., winter, r., ebbinghaus, s. and strodel, b. ( ). effects of in vivo conditions on amyloid aggregation. chem soc rev , - . park, y. j., wen, j., bang, s., park, s. w. and song, s. y. ( ). [ ]-gingerol induces cell cycle arrest and cell death of mutant p -expressing pancreatic cancer cells. yonsei med j , - . querfurth, h. w. and laferla, f. m. ( ). alzheimer's disease. n engl j med , - . rajendran, l. and annaert, w. ( ). membrane trafficking pathways in alzheimer's disease. traffic , - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / regen, f., hellmann-regen, j., costantini, e. and reale, m. ( ). neuroinflammation and alzheimer's disease: implications for microglial activation. curr alzheimer res , - . rhode, j., fogoros, s., zick, s., wahl, h., griffith, k. a., huang, j. and liu, j. r. ( ). ginger inhibits cell growth and modulates angiogenic factors in ovarian cancer cells. bmc complement altern med , . richman, m., wilk, s., chemerovski, m., wärmländer, s. k., wahlström, a., gräslund, a. and rahimipour, s. ( ). in vitro and mechanistic studies of an antiamyloidogenic self- assembled cyclic d,l-alpha-peptide architecture. j am chem soc , - . saenghong, n., wattanathorn, j., muchimapura, s., tongun, t., piyavhatkul, n., banchonglikitkul, c. and kajsongkram, t. ( ). zingiber officinale improves cognitive function of the middle- aged healthy women. evid based complement alternat med , . sardar sinha, m., ansell-schultz, a., civitelli, l., hildesjö, c., larsson, m., lannfelt, l., ingelsson, m. and hallbeck, m. ( ). alzheimer's disease pathology propagation by exosomes containing toxic amyloid-beta oligomers. acta neuropathol , - . selkoe, d. j. and hardy, j. ( ). the amyloid hypothesis of alzheimer's disease at years. embo mol med , - . semwal, r. b., semwal, d. k., combrinck, s. and viljoen, a. m. ( ). gingerols and shogaols: important nutraceutical principles from ginger. phytochemistry , - . sengupta, u., nilson, a. n. and kayed, r. ( ). the role of amyloid-beta oligomers in toxicity, propagation, and immunotherapy. ebiomedicine , - . sharifi-rad, m., varoni, e. m., salehi, b., sharifi-rad, j., matthews, k. r., ayatollahi, s. a., kobarfard, f., ibrahim, s. a., mnayer, d., zakaria, z. a., sharifi-rad, m., yousaf, z., iriti, m., basile, a. and rigano, d. ( ). plants of the genus zingiber as a source of bioactive phytochemicals: from tradition to pharmacy. molecules . sholts, s. b., smith, k., wallin, c., ahmed, t. m. and wärmländer, s. ( ). ancient water bottle use and polycyclic aromatic hydrocarbon (pah) exposure among california indians: a prehistoric health risk assessment. environmental health : a global access science source , . wallin, c., sholts, s. b., Österlund, n., luo, j., jarvet, j., roos, p. m., ilag, l., gräslund, a. and wärmländer, s. k. t. s. ( ). alzheimer's disease and cigarette smoke components: effects of nicotine, pahs, and cd(ii), cr(iii), pb(ii), pb(iv) ions on amyloid-beta peptide aggregation. sci rep , . wang, s., zhang, c., yang, g. and yang, y. ( a). biological properties of -gingerol: a brief review. nat prod commun , - . wang, x., wang, w., li, l., perry, g., lee, h. g. and zhu, x. ( b). oxidative stress and mitochondrial dysfunction in alzheimer's disease. biochimica et biophysica acta , - . wärmländer, s., tiiman, a., abelein, a., luo, j., jarvet, j., söderberg, k. l., danielsson, j. and gräslund, a. ( ). biophysical studies of the amyloid beta-peptide: interactions with metal ions and small molecules. chembiochem , - . wärmländer, s. k., sholts, s. b., erlandson, j. m., gjerdrum, t. and westerholm, r. ( ). could the health decline of prehistoric california indians be related to exposure to polycyclic aromatic hydrocarbons (pahs) from natural bitumen? environ health perspect , - . wärmländer, s. k. t. s., Österlund, n., wallin, c., wu, j., luo, j., tiiman, a., jarvet, j. and gräslund, a. ( ). metal binding to the amyloid-β peptides in the presence of biomembranes: potential mechanisms of cell toxicity. journal of biological inorganic chemistry , – . zeng, g. f., zong, s. h., zhang, z. y., fu, s. w., li, k. k., fang, y., lu, l. and xiao, d. q. ( ). the role of -gingerol on inhibiting amyloid beta protein-induced apoptosis in pc cells. rejuvenation res , - . Österlund, n., kulkarni, y. s., misiaszek, a. d., wallin, c., krüger, d. m., liao, q., mashayekhy rad, f., jarvet, j., strodel, b., wärmländer, s. k. t. s., ilag, l. l., kamerlin, s. c. l. and gräslund, a. .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . / ( a). amyloid-beta peptide interactions with amphiphilic surfactants: electrostatic and hydrophobic effects. acs chem neurosci , - . Österlund, n., luo, j., wärmländer, s. k. t. s. and gräslund, a. ( b). membrane-mimetic systems for biophysical studies of the amyloid-beta peptide. biochim biophys acta proteins proteom. dominy, s.s., lynch, c., ermini, f., benedyk, m., marczyk, a., konradi, a., nguyen, m., haditsch, u., raha, d., griffin, c., holsinger, l.j., arastu-kapur, s., kaba, s., lee, a., ryder, m.i., potempa, b., mydel, p., hellvard, a., adamowicz, k., hasturk, h., walker, g.d., reynolds, e.c., faull, r.l.m., curtis, m.a., dragunow, m., potempa, j., . porphyromonas gingivalis in alzheimer's disease brains: evidence for disease causation and treatment with small- molecule inhibitors. sci adv , eaau . frozza, r.l., lourenco, m.v., de felice, f.g., . challenges for alzheimer's disease therapy: insights from novel mechanisms beyond memory defects. front neurosci , . querfurth, h.w., laferla, f.m., . alzheimer's disease. n engl j med , - . rajendran, l., annaert, w., . membrane trafficking pathways in alzheimer's disease. traffic , - . .cc-by-nc-nd . international licensemade available under a (which was not certified by peer review) is the author/funder, who has granted biorxiv a license to display the preprint in perpetuity. it is the copyright holder for this preprintthis version posted january , . ; https://doi.org/ . / . . . doi: biorxiv preprint https://doi.org/ . / . . . http://creativecommons.org/licenses/by-nc-nd/ . /